Closed Bug 236858 Opened 21 years ago Closed 12 years ago

Repeating GET requests when charset <meta> appears late

Categories

(Core :: DOM: HTML Parser, defect)

x86
All
defect
Not set
critical

Tracking

()

RESOLVED DUPLICATE of bug 61363

People

(Reporter: pdsimic, Unassigned)

References

(Depends on 1 open bug)

Details

Attachments

(3 files)

User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.5) Gecko/20031007 Build Identifier: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.5) Gecko/20031007 After finishing the development of a dynamic web site using php and server-side sessions (using cookies), and uploading it to the server, I noticed that some of it's functionallity is broken, in fact broken are pages using sessions for determining current state of user's progress. I traced this problem (by examining Web server's log files) to be caused by repeating GET requests made by my Mozilla browser, while testing my web site from my machine, over a 56k modem connection. The question is: why the Mozilla browser is repeating GET queries to Web server? Is there a way to increase HTTP (connection?) timeouts? I tried to fiddle with configuration from about:config, but even after setting ...timeout... parameters to enormous values, nothing worked better. The described repetion of GET queries happens in about 90% cases, tested both with Mozilla 1.5 and 1.6. Any help? Thanks in advance. BTW, here is a HTTP header from sample "hands-made" GET request to one of "problematic" pages (and the headers are identical on my development server and my production server, so there are no differences between them to cause these problems): HTTP/1.1 200 OK Date: Mon, 08 Mar 2004 21:03:52 GMT Server: Apache Set-Cookie: admsid=23fe9156addcf1af54d82827cc124a43; path=/admin; domain=xxx.xxxxxxxx.xxx Expires: Thu, 19 Nov 1981 08:52:00 GMT Cache-Control: no-store, no-cache, must-revalidate, post-check=0, pre-check=0 Pragma: no-cache Connection: close Content-Type: text/html Reproducible: Always Steps to Reproduce: 1. Fetch the problematic page from my production server Actual Results: Repeated GET queries to my production server. Expected Results: Just one GET query. My 56k modem connection is quite stable. ;)
reporter: can you please provide a HTTP log per the instructions on this site: http://www.mozilla.org/projects/netlib/http/http-debugging.html feel free to email the log file directly to me if you would like its contents kept private. attaching the log file to this bug is otherwise fine :)
> Content-Type: text/html You don't set a charset here. Does the page set it? Or are you relying on the browser's charset autodetect? Do the repeated GETs go away if you disable charset autodetect?
> You don't set a charset here. Does the page set it? Or are you relying on the > browser's charset autodetect? Do the repeated GETs go away if you disable > charset autodetect? I have the following line in my page's header, so it's setting the charset: <meta http-equiv="Content-Type" content="text/html; charset=iso-8859-2"> In my Mozilla's settings, Navigator->Languages->Character Coding->Default Character Coding is set to "Western (ISO-8859-1)". Excuse me for a stupid question, but how do I disable charset autodetect?
Browser log, during problem reproducing
> reporter: can you please provide a HTTP log per the instructions on this site: I've sent the requested browser's HTTP log, reproducing the described problem. As a notice, it's a bzip2'ed file. In this logfile, http://omega.homelab.net/ is just my start page, while pages making problems are under http://mp3.rskoming.net/admin/, and you can see repeated requests for /admin/index.php, /admin/add.php and finally for /admin/logout.php. As it could have something with local Caching, I've tried to reproduce the problem with all four Caching setting ("Compare the page in local cache with the page on network" - or however ;), and it persists with all four settings. BTW, please don't think I'm violating many laws by distributing MP3's around, this is just a local archive. ;)
> Excuse me for a stupid question, but how do I disable charset autodetect? View menu > Character Encoding > Autodetect > (Off)
> > Excuse me for a stupid question, but how do I disable charset autodetect? > > View menu > Character Encoding > Autodetect > (Off) Just for info, it was (and still is) turned Off...
Attachment #143372 - Attachment mime type: text/plain → application/x-bzip2
> I've sent the requested browser's HTTP log, reproducing the described problem. > As a notice, it's a bzip2'ed file. Any clues out of it?
I also was suffering this problem with my cart. I spent about 40 minutes trying to fix it on the server side, and then I decided to check the request via LiveHTTPHeaders. It was then I noticed that the file was being re-requested. After a quick search on Bugzilla I found this bug, noticed that comments regarding the charset (I set one in the source, but not via HTTP Headers), and sent the charset via the headers. It now works fine. The PHP code to send the charset via headers is header("Content-type: text/html; charset=ISO-8859-1"); if anyone is interested.
Confirming. One of my colleagues has managed to reproduce this reliably on Firefox 1.0, WinXP. (OS->all) Setting the content-type header does indeed resolve the problem. I'll attach HTTP traces in a minute.
Status: UNCONFIRMED → NEW
Ever confirmed: true
OS: Linux → All
Log demonstrating the problem. The request in question is: GET /template/startquote.launch?PolicyType=PC&CompanyName=template&brandName=default HTTP/1.1
Log for a similar request, again, for: GET /template/startquote.launch?PolicyType=PC&CompanyName=template&brandName=default HTTP/1.1 (apologies for unnecessary wrapping) The only server-side change bewteen these two requests was to explicitly set a "Content-Type: text/html;charset=UTF-8" response header, rather than relying on the default. I have reason to believe (though no detailed logs to back up my hunch) that this problem is restricted to GET. But then, if we were duplicating POSTs, people would be yelling rather loudly because of all the site bustage.
this sounds like it's caused by a <meta> tag specifying a charset, where's the bug?
Further observations: The double "GET" only occurs if the page has a character-encoding meta tag which differs from the encoding selected in the Firefox View->Encoding menu. If the browser encoding matches the encoding in the page, there's only a single request, and everything is hunky dory. [The reporter's web page is ISO-8859-2 (Central European); ours are UTF-8; both our browers are set to ISO-8859-1 (Western)] The reporter is sending "Cache-Control: no-store". so are we. <wild speculation, based on minimal knowledge of moz's networking/parsing code, apologies if I'm off-target by a radian or two> Moz recieves the page, and starts parsing. Once the parser has gotten as far as the html meta http-equiv content-type tag, [something] realises it's using the wrong encoding, drops everything on the floor, and re-requests the incoming data from [something upstream] using the *correct* encoding. What "should" happen: [something upstream] mungles the incoming data into the correct encoding, and sends it to the parser, which starts parsing again. What's actually happening: [something upstream] issues a second HTTP GET to the originating web server. (Possibly because of the draconian cache-control header?) </wild speculation>
Christian: Apologies, last comment posted before I grokked your comment. It's past midnight here. Were you thinking of bug 61363 ? Based on bz's comment 2 above, that's certainly the one he had in mind. I'm pretty sure I had charset autodetect turned off when I hit the problem. That said, it does sound awfully similar. Two different ways to trigger the same problem?
comment 14 describes what happens exactly. I can't tell whether bug 61363 applies only to autodetection or also to the more general case of current encoding != meta encoding. personally I don't consider this a bug, but this is not my code... not a necko issue, since necko can't cache no-store pages; moving this to intl
Assignee: darin → smontagu
Component: Networking: HTTP → Internationalization
QA Contact: core.networking.http → amyy
Depends on: latemeta
Bug 61363 does also include the case of charset specified by meta, but it doesn't (or didn't) happen when the <meta> is in the first 2048 bytes of the document. Is that the case here?
Simon: I can confirm that, in our case, the meta tag most definitely IS within the first 2048 bytes of the document content (it actully runs from ~380-450 characters, which unless my understanding of UTF-8 is way off, means it's actually in the 380-450 byte range, given that there aren't any heavy-duty characters likely to require multiple bytes that early in the document) I'll try to roll a "simple" test case in the near future, but it's probably going to be a day or two until I get time, and it's probably going to be JSP-based when it does happen
I guess relevant is not so much byte count, but whether it is in the first packet (or first 2048 bytes of the first packet or something)
I recently posted in the bug forum about a similar problem where pages are for some reason loading twice. I was referred to this bug. http://forums.mozillazine.org/viewtopic.php?t=209461 A SUMMARY: We've discovered a problem with our CLRStore.com website when using Firefox. The only thing I can deduce at this point is a Firefox brower bug. I've done a ton of testing on this, and I simply can't explain why Firefox mysteriously loads the following page twice but Internet Explorer only loads it once as it should. https://www.clrstore.com/cgi-bin/store.cgi Here's what I mean: When adding a new product to the shopping cart (a product you haven't already added to the cart in the past under the same session), Firefox incorrectly loads the script twice causing two products to be added to the cart instead of one. I know this because you will notice a message stating the product already exists in the shopping cart from the first time the page was loaded, yet after the "add to cart" link was clicked. When a product already exists, the quantity is added to the existing order. If I add a product that I've already added to the cart in the past (and then removed again), the page only loads once like it should and the product is subsequently only added once as well. I tested this same thing in Internet Explorer and to my surprise it worked just fine. Why would the same website run differently on separate browsers? And why would Firefox cause the same page to reload once it has already parsed to right around the middle of the store.cgi script? What seems to reload the page is either the Perl "index" function or a delayed reaction by Firefox. I've narrowed it down to the exact spot in the script using trial and error. This looks like a Firefox bug to me and I'm certain it isn't my script. I've checked and there is no way the product could be added twice without reverse processing the script page or reloading the entire script page. Anyone have any ideas? I can post portions of the code if needed, but I don't know if it would help. This problem breaks my script and I'm amazed it is still around.
I guess your problem would probably be fixed by sending a charset in the HTTP header. at a guess, you are currently having it in a <meta> tag, and do not send a http header, and send headers not to cache the page. that makes mozilla reload the document once it sees that <meta>. let me note that that was mentioned some comments above too...
I just tested adding "Content-Type: text/html; charset=UTF-8" to my Perl script instead of the "Content-Type: text/html" I had before and things now work great! Just wanted to follow up on my previous comment I left a few days ago.
I've been searching for the answer to this problem for a while now, and having finally found this page, I am scratching my head that some people seem to think that is isn't really a bug. Why should the browser send two page requests just because the HTTP header and the meta tag are absent or contradictory? Assuming this really is the cause, it is strange for the browser to behave this way. And as others have noted, it causes havoc on sites where page requests are logged in a database or a cookie for some reason. I can't think of any reason why a browser should make two page requests just because of the charset encoding. It is still happening in the latest release of Firefox, and it makes no sense.
it makes two requests because it needs to reinterpret the data in the other character set and doesn't have the data for that locally, so it regets it from the server (this is why it does do that, it doesn't necessarily mean this is good behaviour)
(In reply to comment #24) > it makes two requests because it needs to reinterpret the data in the other > character set and doesn't have the data for that locally, so it regets it from > the server (this is why it does do that, it doesn't necessarily mean this is > good behaviour) Well...yes, I figured that out. I meant my question in a more philosophical sense. As in, is this really a desirable feature? It seems to me that a better way to handle this would be to have Firefox have some form of priority list which declares whether to use the HTTP header or the meta tag in the event that they are absent or contradictory. But Firefox having to make a whole new request to the server? I just can't see this as anything other than a bug.
It certainly has that priority list. If there's an HTTP header it's used. The real question is, if you have to guess what the charset is, and you can only do that some thousand characters after the page started, and the page asked not to be cached, what do you do?
I think this is realted to my bug on: https://bugzilla.mozilla.org/show_bug.cgi?id=359690 Which is really a pain. I've got at least 10 of these administrations set up. What combination of headers worked? I've tried the following in different combinations. No luck. //header("Cache-control: private"); // IE 6 Fix. //header("Content-type: text/html; charset=ISO-8859-1"); // FF 2.0 Fix //header("Content-Type: text/html; charset=UTF-8"); // FF 2.0 Fix
I think this is realted to my bug on: https://bugzilla.mozilla.org/show_bug.cgi?id=359690 Which is really a pain. I've got at least 10 of these administrations set up. What combination of headers worked? I've tried the following in different combinations. No luck. //header("Cache-control: private"); // IE 6 Fix. //header("Content-type: text/html; charset=ISO-8859-1"); // FF 2.0 Fix //header("Content-Type: text/html; charset=UTF-8"); // FF 2.0 Fix
(In reply to comment #25) > (In reply to comment #24) > > it makes two requests because it needs to reinterpret the data in the other > > character set and doesn't have the data for that locally, so it regets it from > > the server (this is why it does do that, it doesn't necessarily mean this is > > good behaviour) > Well...yes, I figured that out. I meant my question in a more philosophical > sense. As in, is this really a desirable feature? It seems to me that a > better way to handle this would be to have Firefox have some form of priority > list which declares whether to use the HTTP header or the meta tag in the event > that they are absent or contradictory. But Firefox having to make a whole new > request to the server? I just can't see this as anything other than a bug. As far as I know, standards, or at least best-practices say that the charset value in the HTTP Content-Type header should be used above a <meta> element (considering the name is http-*EQUIV* (eg, it should be in the HTTP header anyway). In the general sense, this behaviour is expected if there's no charset specified in Content-Type or it differs from the <meta> declaration. As others have said in fewer words than myself, the problem is that ideally you need to know the charset to be able to parse the page (otherwise how do you interpret the character data?). If the browser is unsure of the intended charset it's stuck unless it guesses the charset. So if it then finds a declaration in the page (in a <meta>) after it's started parsing the page, what should it do? Continue parsing the page using it's guessed charset, or do things "properly" (to avoid charset mis-match issues) by reloading and reparsing the page using the charset it found it the <meta> the first time round. I'm not saying this is a good thing, but you can't expect to not give an HTTP user agent the charset info, then expect it to magically know the charset before it loads and parses the page lol As others have said, doing things properly by specifying the correct charset used in the HTTP headers removes this problem completely; the browser knows the charset before it starts parsing the page. However, I can appreciate how this affects some sites which either can't or won't specify the charset in the HTTP Content-Type header. One possible solution might be to have the browser keep the page in memory (only request it from the server once), and if it finds conflicting or a charset different to the guessed default then, if possible, it should reparse the page using the newly learned charset while still in memory. I'm not a programmer so I can't comment on how this could be implemented or how difficult it would be.
Further clarification to my previous comment (#29): If the HTTP headers say not to cache the page in any way this might still be possible if it's all considered part of a single request from the user. By that I mean it should all be treated as a single user-request of the page (regardless of HTTP, which should be a single request in a perfect world, but if you're not going to specify the charset in the HTTP what do you expect lol), unless the user requests a page refresh or some other operating which would normally invoke HTTP activity. However, I've got the nasty feeling that reparsing in memory would probably break at least something.
QA Contact: amyy → i18n
It doesn't look like this ever got resolved although I see a few recent posts by others relating to image request multiple GET requests. I'm also experiencing this on my server. Here is a LiveHTTP request log: http://mra.advanceday.com/link/9fCqc01E01C01ExlHfixi8qM9cRRS1F1C1BU1632T GET /link/9fCqc01E01C01ExlHfixi8qM9cRRS1F1C1BU1632T HTTP/1.1 Host: mra.advanceday.com User-Agent: Mozilla/5.0 (Windows; U; Windows NT 6.1; en-US; rv:1.9.2.13) Gecko/20101203 Firefox/3.6.13 ( .NET CLR 3.5.30729; .NET4.0C) Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8 Accept-Language: en-us,en;q=0.5 Accept-Encoding: gzip,deflate Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7 Keep-Alive: 115 Connection: keep-alive HTTP/1.1 200 OK Date: Tue, 25 Jan 2011 13:48:16 GMT Server: Apache Expires: Tue, 25 Jan 2011 14:48:16 GMT Content-Length: 15093 Connection: close Content-Type: image/jpeg; charset=binary ---------------------------------------------------------- http://mra.advanceday.com/link/9fCqc01E01C01ExlHfixi8qM9cRRS1F1C1BU1632T GET /link/9fCqc01E01C01ExlHfixi8qM9cRRS1F1C1BU1632T HTTP/1.1 Host: mra.advanceday.com User-Agent: Mozilla/5.0 (Windows; U; Windows NT 6.1; en-US; rv:1.9.2.13) Gecko/20101203 Firefox/3.6.13 ( .NET CLR 3.5.30729; .NET4.0C) Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8 Accept-Language: en-us,en;q=0.5 Accept-Encoding: gzip,deflate Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7 Keep-Alive: 115 Connection: keep-alive HTTP/1.1 200 OK Date: Tue, 25 Jan 2011 13:48:17 GMT Server: Apache Expires: Tue, 25 Jan 2011 14:48:17 GMT Content-Length: 15093 Connection: close Content-Type: image/jpeg; charset=binary
Comment #31 said: > Content-Type: image/jpeg; charset=binary I believe this is somewhat confused. If it’s binary data (i.e.; *not* text), there’s no charset to specify. Read the HTTP/1·1 spec (RFC 2616, text-search: “binary” (*with* quotes)); AFAICT “binary” is one possible option for Transfer-Encoding or Content-Encoding, but not Content-Type (charset, where it would be nonsensical in my understanding). Try configuring the server to respond without specifying a ‘charset’ for binary data, thus: Content-Type: image/jpeg > Expires: [an hour in the future, from time of request] I’d also question why (for images) you have Expires: set to only an hour in the future. Unless the images are actually displaying dynamic data (generated from elsewhere), which really does change *every* hour, then images (especially) should be set to something like a year in the future (relative to time of request, naturally), to enable caching (RFC 2616 & http://www.mnot.net/cache_docs/). If, then, the image displayed on (a|some) particular page(s) really needs to be different, then use a different *source* URI in the <img> element in the page mark-up. Best of both.
The charset=binary; is being generated by a mime type identifier (such as file on linux) though this is through php, not setting it specifically. The hour in the future is just what you said, the graphic is being generated and has a lifetime of one hour. The question really though, is why is FF doing a double-request in the first place?
Is Simon Montagu still working on this or we should assignee this bug to other person ?
Don't use GET to change state on your server; clients, intermediaries, spiders, etc. can and will make automated requests, pre-fetch, retry failed requests, etc. Use POST.
Same issue with FF4 and content like images/css and Apache. What is especially annoying (at least for me) is that the second request does NOT provide session cookie. Example: GET /img_bg.gif HTTP/1.1 Host: 192.168.1.9 User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:2.0.1) Gecko/20100101 Firefox/4.0.1 Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8 Accept-Language: fr,en;q=0.5 Accept-Encoding: gzip, deflate Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7 Keep-Alive: 115 DNT: 1 Connection: keep-alive Cookie: sessionid=*********/************ HTTP/1.1 200 OK Date: Fri, 10 Jun 2011 04:01:29 GMT Server: Apache/2.2.9 (Debian) PHP/5.2.6-1+lenny10 with Suhosin-Patch Last-Modified: Fri, 10 Jun 2011 03:40:58 GMT ETag: "a289-9b-4a5535671a680" Accept-Ranges: bytes Content-Length: 155 Keep-Alive: timeout=15, max=100 Connection: Keep-Alive Content-Type: image/gif GIF89a...........; GET /img_bg.gif HTTP/1.1 Host: 192.168.1.9 User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:2.0.1) Gecko/20100101 Firefox/4.0.1 Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8 Accept-Language: fr,en;q=0.5 Accept-Encoding: gzip, deflate Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7 Keep-Alive: 115 DNT: 1 Connection: keep-alive HTTP/1.1 200 OK Date: Fri, 10 Jun 2011 04:01:29 GMT Server: Apache/2.2.9 (Debian) PHP/5.2.6-1+lenny10 with Suhosin-Patch Last-Modified: Fri, 10 Jun 2011 03:40:58 GMT ETag: "a289-9b-4a5535671a680" Accept-Ranges: bytes Content-Length: 155 Keep-Alive: timeout=15, max=99 Connection: Keep-Alive Content-Type: image/gif GIF89a...........;
What's the status of fixing this bug ? Any plans ? It's hurting our servers and bandwidth.
What is the status of this bug ? Same issue with FF v16.0.2 when requesting dynamically generated image. Firefox sends request twice.
This is a long, long standing bug and really should be fixed. I come across it occurring fairly regularly. Imagine the bandwidth being wasted due to this bug, double-downloading images.
There are a lot of people receiving these updates, perhaps if we all vote for this bug it will make a difference?
What's the status in fixing this bug?
Flags: needinfo?(smontagu)
As far as I know nobody is working on this, nor on bug 61363 which it depends on.
Assignee: smontagu → nobody
Component: Internationalization → HTML: Parser
Flags: needinfo?(smontagu)
First of all, as far as I can tell this problem, as originally reported, is simply a duplicate of bug 61363. Hence, I am marking this is a duplicate. The problem described in comment 31, comment 37 and comment 39 is most likely a different problem arising from prefetching images. The original problem can be 100% avoided by the Web page author by using HTML correctly as required by the HTML specification. There are three different solutions any one of which can be used: 1) Configure your server to declare the character encoding in the Content-Type HTTP header. For example, if your HTML document is encoded as UTF-8 (which it should be), make your servers send the HTTP header Content-Type: text/html; charset=utf-8 instead of Content-Type: text/html This solution works with any character encoding supported by Firefox. OR 2) Make sure that you declare the character encoding of your HTML document using a "meta" element within the first 1024 bytes of your document. That is, if you are using UTF-8 (which you should), start your document with <!DOCTYPE html> <html> <head> <meta charset=utf-8> <title>Whatever> etc. and don't put massive comments, scripts or other stuff before <meta charset=utf-8>. This solution works with any character encoding supported by Firefox except UTF-16 encodings, which you shouldn't be using anyway. OR 3) Start your document with a BOM (byte order mark). If you're using UTF-8, make the first three bytes of your file be 0xEF, 0xBB, 0xBF. You probably should not use this method unless you're sure that the software you are using won't accidentally delete these three bytes. This solution works only with UTF-8 and UTF-16, but you should not be using UTF-16 anyway, which is why I did not give the magic bytes for UTF-16. As for the other problem related to prefetching images, please see https://developer.mozilla.org/en-US/docs/HTML/Optimizing_Your_Pages_for_Speculative_Parsing Finally, Firefox 4 had a bug which made it load images between <noscript> and </noscript> even when scripting was enabled. That bug has been fixed.
Status: NEW → RESOLVED
Closed: 12 years ago
Resolution: --- → DUPLICATE
Summary: Repeating GET requests → Repeating GET requests when charset <meta> appears late
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: