Closed Bug 416178 Opened 17 years ago Closed 9 years ago

XMLHttpRequest posts set charset= in Content-Type header, breaking some webservers

Categories

(Core :: DOM: Core & HTML, defect)

1.9.0 Branch
defect
Not set
major

Tracking

()

RESOLVED DUPLICATE of bug 918742

People

(Reporter: alexander.klimetschek, Unassigned)

References

Details

User-Agent:       Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10.5; de; rv:1.9b3) Gecko/2008020511 Firefox/3.0b3
Build Identifier: Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10.5; de; rv:1.9b3) Gecko/2008020511 Firefox/3.0b3

Firefox 3 beta 2 and beta 3 RC are (again) sending

Content-Type: application/x-www-form-urlencoded; charset=UTF-8

for a form POST. The "; charset=UTF-8" breaks many webservers, resulting in error or empty pages.

This is an old problem and it was accepted as not possible because the number of webservers, that can't handle it, is too large. It was removed from earlier Mozilla/Gecko versions, see these bugs for example (there are many bugs around that topic):

https://bugzilla.mozilla.org/show_bug.cgi?id=18643
https://bugzilla.mozilla.org/show_bug.cgi?id=7533#c4

I have already seen this (current) bug which is about a similar problem with multipart forms:

https://bugzilla.mozilla.org/show_bug.cgi?id=413974



Reproducible: Always

Steps to Reproduce:
1. Create a HTML page with a form with action=POST
2. Open this page with Firefox (3b3)
3. Submit the form and trace the HTTP request
Actual Results:  
This HTTP Header is part of the request:

Content-Type: application/x-www-form-urlencoded; charset=UTF-8

Expected Results:  
This is the lowest-common denominator for webservers:

Content-Type: application/x-www-form-urlencoded

Many servers choke on header params in the form of "; key=value" in the Content-Type header.

Used my profile from FF 2.0 for FF 3 beta, so I am a normal "upgrader" - no special configs involved, if that might be the case for such an "experimental feature" ;-)
Uh... do you actually have a form that breaks that you can point me to?  There is no code in beta3 to do this for urlencoded form submissions, so I don't see how it could possibly be happening.
Oh, I forgot to mention that it happens when doing an XHR with the prototype javascript lib. Haven't tested normal forms yet - but I thought it happens always ;-)

Here is a (hopefully) self-contained example:

    <form id="login" action="" method="post">
        <h2>Please Login to Post</h2>
        <input type="text" name="username" value="" class="login"/>
        <label for="username"><small>ISID</small></label>

        <input type="password" name="password" value="" class="login"/>
        <label for="password"><small>Password</small></label>
        
        <p class="postmetadata">
            <input class="link" type="submit" value="Submit" />
        </p>
    </form>
    <script src="/blog/js/prototype.js" type="text/javascript"></script>
    <script>
        var login = $("login");
        login.onsubmit = function() {
        
            var url = login.action;
            new Ajax.Updater("formplaceholder", url, {
                evalScripts: true,
                method: "post",
                parameters: login.serialize(true),
                contentType: "application/x-www-form-urlencoded",
                encoding: ""
                }); 
            return false;
        }
    </script>
    <div id="formplaceholder"></div>
Oh.  XHR.  That matters!
Status: UNCONFIRMED → RESOLVED
Closed: 17 years ago
Resolution: --- → DUPLICATE
Tested with Firefox 3.0 beta 4 (on Mac) and the fix for bug 413974 (which I assume should be fixed in beta 4) DOES NOT solve this problem.

I think the duplication of this bug was wrong from the beginning.
- bug 413974 is about enctype multipart/form-data
- this bug is about enctype application/x-www-form-urlencoded
Status: RESOLVED → UNCONFIRMED
Resolution: DUPLICATE → ---
> DOES NOT solve this problem.

In that case, please put up a testcase showing the bug.  That is, a web page that I can go to to see the issue, as well as the source of the server code involved.  I did test exactly this situation when writing the patch for bug 413974, and in my testing it is fixed.

>- bug 413974 is about enctype multipart/form-data

The same codepath is used for both enctypes.
Oh, I should have clarified.  The new code WILL always send the charset.  It will put the charset right after the MIME type before all other params (instead of at the end of the MIME type, where it used to go).

I guess your original report claims that even this will break servers.  I suppose we could special-case this one particular MIME type in XMLHttpRequest.  That seems uncalled-for to me, but if the breakage is really that widespread we will have to.

Nominating for blocking, but honestly, I've seen no other reports of this being a problem.
Status: UNCONFIRMED → NEW
Component: General → DOM: Mozilla Extensions
Ever confirmed: true
Product: Firefox → Core
QA Contact: general → general
Yes, it's the pure existence of "; charset=" that will break the webserver.

I have already placed a sample HTML in comment #2. But it's difficult to provide the webserver, cause it's proprietary. The parsing bug is fixed in there for a new version, but there are many large-scale installations out there. Problem is that parameters cannot be parsed at all for an XHR post request, which can be tricky in Ajax-heavy applications. Updating the servers is not a quick option.

Unfortunately I wasn't able to find an example installation that has a POST form using XHR, but if there is, it's an extremely difficult problem to spot. I also assume that there are lots of other webserver implementations with the same problem; i guessed so by reading this older issue around the same topic: bug 7533 (Interesting comments start at #34). I know it's over 8 years old (whew!), but I wouldn't be confident that this bug is out of the web, especially since all the main browsers don't do it.
And here is a possible solution that keeps the feature, but allows XHR developers to opt-out: don't add ";charset=utf-8" automatically if the XHR request only specifies a mime-type. But if the javascript code explicitly sets the content-type to something including the charset, keep it (and do whatever the underlying code might do, eg. reformatting or security checks).
Safari 3.0.4 behaves that way: if you don't specify "; charset=..." in the XHR content-type header, it won't add it. If you do, it will be passed through.
The whole point of adding a charset is that we need to identify the charset of the data, because otherwise neither the sender nor the receiver know what encoding the data is sent in.  We used to do what Safari 3.0.4 does.  It breaks sites, as it happens.
Flags: blocking1.9?
Summary: Form posts set charset= in Content-Type header, breaking many webservers → XMLHttpRequest posts set charset= in Content-Type header, breaking many webservers
Note that the "breaks many web servers" claim could really use some backing up....
Firefox 3 beta 2, 3 and 4 did never behave like Safari 3.0.4. The charset was *always* appended, even if none was set by the javascript code in the XHR object.
Yes, but Firefox 2 behaved like Safari 3.0.4.
I see, didn't know that.

The problem with the unknown charset in browser requests is solved by web frameworks in different flavors anyway (we for example use a hidden parameter FormEncoding...). Would be cool to rely on charset set in content-type as standard, but it must be possible to avoid breaking existing servers that are buggy.
Summary: XMLHttpRequest posts set charset= in Content-Type header, breaking many webservers → XMLHttpRequest posts set charset= in Content-Type header, breaking some webservers
The spec says that when a string is sent it should always be utf8 encoded. In that case it seems like the receiver will always know the encoding and so sending it seems pointless.

The spec also says to append the charset parameter, but maybe we can get the spec changed on that. Mailing them now.
> it seems like the receiver will always know the encoding and so
> sending it seems pointless.

Only if the receiver knows that it's being sent an XMLHttpRequest and can special-case the text processing.  Since this whole bug is about using XMLHttpRequest to generate generic form submissions, the receiver knows no such thing in the cases in this bug, or in many other cases.  With forms, you can at least select the encoding the receiver expects if absolutely needed, but with XMLHttpRequest you always get UTF-8, so the only way to make it work is to tell the receiver so and fix all receivers to try following the specs instead of just writing a "it happens to work" parser by trial and error.
Per discussion with Jonas we've decided that we won't change how Firefox works here. If there's evidence out there that a large set of real websites out there that break due to this, we'd be willing to reconsider this decision, but with only one bug report and no real feel of the number of broken sites it doesn't seem worth undoing this change.
Status: NEW → RESOLVED
Closed: 17 years ago16 years ago
Flags: blocking1.9? → blocking1.9-
Resolution: --- → WONTFIX
IMHO there aren't any bug reports yet because FF3 is still beta and there isn't wide adoption yet.
I'll back up this report, and I'll even show you an example of it in action. 

The server is running helma, and echo's back the post object.

http://clusterfudge.org:8082/mix/command?expr=users.dev.postDemo

And yes, that should read "echoes".
We're being hit by this issue using proprietary enterprise systems, notably CA Siteminder and other security firewalls which have rules to reject Content-Type values with the charset asserted.

The current situation means we cannot use XMLHTTPrequest to POST to such servers. We don't have access to the server code, some of which is in firmware and on machines beyond our influence, and without a work-round from Javascript, users are being advised not to use Firefox 3.0 or 3.1.
Paul, have you considered raising your problem in the W3C group working on XMLHttpRequest?  I don't have a problem with adding a way for the page to opt-out of sending the charset header, but I'd be happier doing that in a way allowed by the spec instead of just doing something random.
I'm also having a hard time believing that these firewalls really reject _everything_ with a charset, since so much of the web carries charset params on Content-Types....
Paul, I'd also be interested in knowing your exact situation:  What you're setting Content-Type to, what type of object you're passing to send(), what header you're getting as a result, and what your firewall actually accepts and rejects.
Boris, thanks for replying.

I haven't contacted the W3C WG, but could if there's a spec issue, but my reading of:

   http://www.w3.org/TR/2008/WD-XMLHttpRequest-20080415



is adding "; charset .." is  within specification,  but is optional, and what there is of

   http://www.w3.org/html/wg/html5/#applicationx-www-form-urlencoded

doesn't demand asserting the charset, but that's not so clear. 

From experiments with curl and differencing the working Form POST headers, which is identical apart from the charset string,  with the non-working XHR I'm pretty convinced our issue is directly attributable to the Content-Type value not being *exactly* "application/x-www-form-urlencoded" 

The difficulty with Firefox 3.0 and 3.1 appending the charset, is we have no mechanism to not send it. I'd suggest either reverting or giving us an option to remove it, ideally from Javascript, given this is behavior not backwards compatible, or exhibited by an other browser we can find. 

Firefox's current behaviour is to assert a charset of UTF8, regardless of one being missing, or supplied with a different value. We'd like to be able to programatically send no charset value.   I'd also note, this is new behavior introduced by Firefox 3.0 is for something easily supplied by the Javascript caller, and in a cross-browser way, if they choose to assert the charset as being UTF-8. 

I'd offer more detail on the actual implementations causing the issue, but they're security gateways and single signon services deployed inside a large enterprise, and as you can imagine, this is a sensitive area.
Paul, you're not looking at the latest spec version.  The latest one, cleverly not linked from anywhere useful, is at http://dev.w3.org/2006/webapi/XMLHttpRequest/#send

For what it's worth, I raised the issue with the relevant working group.
In case you want to follow up, this is the '[XHR] Some comments on "charset" in the Content-Type header' thread in the public-webapps@w3.org mailing list.
And again for what it's worth, I think that getting the spec changed here would benefit from hard data that this is the only way to deal with the problem.  I realize that you may not be willing to provide this data in public, but the W3C has provisions for private communication of sensitive information for spec editors, I believe.
Our server (4d_WebStar_D/7.8) does not parse form variables when any charset value is added to Content-Type.  So the Firefox3 behavior which adds charset broke the pages using multipart/formdata POST with XMLHttpRequest.  Our code could not access the submitted form data.

I was able to workaround the problem by using sendAsBinary() instead of send() when browser is Firefox3.
Richard, you need to fix your server to actually follow the HTTP specification....
Boris, unfortunately I don't have access to the source code that controls this.  We are using an old version and are unable to upgrade or switch systems right now. 
The workaround will keep us for awhile.
 
I guess the thing that bothers me about the Firefox3 behavior is that it changes a value specifically set using setRequestHeader().  That seems like an odd thing to do.
It's not that odd in cases where that value is inconsistent with other data we have, for what it's worth...  Say if you explicitly set a charset other than UTF-8, and we encode the data as UTF-8.

In any case, please take spec issues to the W3C?
(In reply to comment #34)
> It's not that odd in cases where that value is inconsistent with other data we
> have, for what it's worth...

I think this is just too much "magic". If you set request headers directly, you expect them to be used - even if you do it wrong, ie. when the body actually contains a different charset.

What is the advantage of the current implementation? For most systems none, since they rely on different ways to pass the charset to the server (hidden form values etc., and in most cases this is utf-8). So if those should switch to the proper way of putting the charset in the content-type header, they should be able to choose that way on their own - when their servers and firewalls are ready to handle that format.
Alexander, you're saying that Mozilla should send malformed HTTP requests, violating the HTTP spec, just because the page author asked it to?  I don't think so.

> What is the advantage of the current implementation?

Much better functioning with cross-site XMLHTPRequest, where the server and XMLHttpRequest caller are completely independent.
(In reply to comment #36)
> Alexander, you're saying that Mozilla should send malformed HTTP requests,
> violating the HTTP spec, just because the page author asked it to?  I don't
> think so.

Well, not setting the charset in the content type header is not violating the HTTP spec. Also, using XHR as a page developer is more like a HTTP client lib than a user-driven browser, so you need ways to determine what is actually sent from your code.
OS: Mac OS X → All
Hardware: PowerPC → All
Whiteboard: See comment 31 for a workaround
Version: unspecified → 1.9.0 Branch
Component: DOM: Mozilla Extensions → DOM
The W3C spec explicitly states that charset is not allowed for application/x-www-form-urlencoded:

http://www.w3.org/TR/html5/forms.html#application/x-www-form-urlencoded-encoding-algorithm
None of the text you link to talks about the request headers involved.

Though it does suggest that servers shouldn't use the Content-Type request header for charset info.  However in practice some do...
The linked spec says that the charset is not allowed for this MIME type. This would suggest that "application/x-www-form-urlencoded; charset=utf-8" is not a valid MIME type string.

Also it seems that Firefox is adding the charset even when the application explicitly sets the content-type to "application/x-www-form-urlencoded" (no charset) via setRequestHeader(). The XMLHttpRequest spec (http://www.w3.org/TR/XMLHttpRequest/#dom-xmlhttprequest-send) specifies two cases where the user-agent should modify the Content-Type when sending the XHR data:

1. If a Content-Type header is in author request headers and its value is a valid MIME type that has a charset parameter whose value is not a case-insensitive match for encoding, and encoding is not null, set all the charset parameters of that Content-Type header to encoding.

2. If no Content-Type header is in author request headers and mime type is not null, append a Content-Type header with value mime type to author request headers.

This does not include the case where the application has set the Content-Type _without_ a charset. Firefox still mangles the content-type in this case.
When using Amazon S3's signed PUT requests, this bug is causing the content-type header to not match what is required (in this case it should be the string "application/json" as in the signature, but is "application/json; charset=UTF-8" instead), and thus the request fails because the signatures do not match.

I don't understand why this is marked RESOLVED WONTFIX - this is obviously a bug, prevents sending perfectly reasonable requests that are within the spec, and does so by "magically" changing the headers from what was explicitly requested. At a minimum there must be a way to "opt-out" of this behavior.
Doesn't spec define behavior here?
It does, in step 4 of https://xhr.spec.whatwg.org/#dom-xmlhttprequest-send, which is in line with that comment 40 outlines, though that comment points to a document we should not look at.

Removing the whiteboard comment since we removed sendAsBinary() so that no longer works as a workaround.

Reopening since we should tweak our behavior here per the specification, although I suggest that bz signs off on it since he largely instigated the whole charset business in the first place, iirc.
Status: RESOLVED → REOPENED
Resolution: WONTFIX → ---
Whiteboard: See comment 31 for a workaround
Hi,

I went to the entire thread. Its important to realize that in more cases its impossible to convince the API provider to make changes like this because they do not want to touch an existing stable system. 
Our company caters to the BPO industry which has all kinds of browsers in enterprise market and vows to support all of them. But the API that we are using is giving the same problem with charset:utf. Its very important that Mozilla resolves this as soon as possible. Safari, IE, Chrome all works fine except for Firefox which seems to have this issue of charset:utf.

I request you not to make us feel handicapped in this regard. For a start may I ask you that what is your target hits on this thread that you need to decide that this needs to be fixed ? At least that should be a starter on this issue.
This is a known bug and we want to fix it - it's required for conformance with the spec. Note how several sub-tests here fail because Gecko is adding too many ";charset=UTF-8" :
http://w3c-test.org/XMLHttpRequest/send-content-type-charset.htm
(This is a test in the official W3C XMLHttpRequest test suite)

I think this is a dupe of bug 918742, which is about fixing failures on that test. Since it's a real world problem we should nudge the priority of that bug upwards, but unfortunately I don't know when somebody will get around to fixing it.
Status: REOPENED → RESOLVED
Closed: 16 years ago9 years ago
Resolution: --- → DUPLICATE
Component: DOM → DOM: Core & HTML
You need to log in before you can comment on or make changes to this bug.