Closed Bug 1405696 Opened 7 years ago Closed 7 years ago

XMLHttpRequest using UTF-8 as input to the URL parser causes compatibility issues

Categories

(Core :: DOM: Core & HTML, defect, P2)

57 Branch
defect

Tracking

()

VERIFIED FIXED
mozilla58
Tracking Status
firefox-esr52 --- unaffected
firefox56 --- unaffected
firefox57 + fixed
firefox58 --- fixed

People

(Reporter: yfdyh000, Assigned: wisniewskit)

References

Details

(Keywords: regression)

Attachments

(2 files)

Attached file Fiddler_2_Full.txt (deleted) —
STR:
1. Open http://xukezheng.cbrc.gov.cn/ilicence/
2. Input the "安" to "机构名称:" field.
3. Click the "查询" button located on the upper right corner.


Actual results:
Empty result on the page, as fullName=%E5%AE%89 instead of fullName=%B0%B2.


Expected results:
Some results with name contain "安".

No broken for sites. Chrome 61 work fine.


Regression range:
https://hg.mozilla.org/integration/autoland/rev/8ff32bdb356d

Bug 1322874 - Remove nsIURI::originCharset
The site uses XHR to POST a request. The document encoding isn't supposed to affect XHR POST requests, but apparently it is web-incompatible.
Flags: needinfo?(annevk)
What exactly is passed to send()?
Flags: needinfo?(annevk)
(In reply to Anne (:annevk) from comment #2)
> What exactly is passed to send()?

Minimized STR:
1. Go to http://xukezheng.cbrc.gov.cn/ilicence/ (or any other gbk-encoded pages)
2. Open web console.
3. Type `
xhr=new XMLHttpRequest();xhr.open("POST", "getLicence.do?useState=3&organNo=&fatherOrganNo=&province=&orgAddress=&organType=&branchType=&fullName=安&address=&flowNo=", true);xhr.send("start=0&limit=10");
` into the console.
4. Switch to Network tab and see the fullName parameter.

AR: fullName: 安
ER: fullName: %B0%B2
Flags: needinfo?(annevk)
Thanks, so the problem is that we use UTF-8 for the URL parser invoked from XMLHttpRequest rather than the encoding of the document.

Given the results for XMLHttpRequest/open-url-encoding.htm in web-platform-tests it would probably be more expedient for Firefox and the standard to change behavior rather than hope everyone else will. :-(

Thomas, is this something you want to take on?
Flags: needinfo?(annevk) → needinfo?(wisniewskit)
Component: Networking → DOM
Priority: -- → P2
Summary: Form field has been sent as utf-8 instead of originCharset (gbk) resulting in empty results on specific site → XMLHttpRequest using UTF-8 as input to the URL parser causes compatibility issues
I'm interested yes, but also on PTO next week and swamped this week, so I'll understand if someone wants to beat me to it.
Flags: needinfo?(wisniewskit)
Hey Thomas, this is currently tracking 57. would you consider this a blocker?
Flags: needinfo?(wisniewskit)
I'll tentatively say yes, but I'm not at all sure how widespread the issues are, so I can't really be sure. Do we know if this affects more sites, or if xukezheng.cbrc.gov.cn is a site we would block release for on its own?
Flags: needinfo?(wisniewskit) → needinfo?(jmathies)
We only have this report afaik.
Flags: needinfo?(jmathies)
OS: Unspecified → All
Hardware: Unspecified → All
This is now tracked by https://github.com/whatwg/xhr/issues/159. I can land the specification change once we update Gecko and "correct" the corresponding web-platform-tests test.

(I found that other browsers do the exact same broken thing for fetch(). I wrote a test for that and filed bugs against the other browsers hoping we can still keep the correct behavior there.)
Should we consider backing Bug 1322874 out, or is it important for other things in 57?
Flags: needinfo?(annevk)
Flags: needinfo?(VYV03354)
Flags: needinfo?(annevk) → needinfo?(valentin.gosu)
(In reply to Mike Taylor [:miketaylr] (58 Regression Engineering Owner) from comment #10)
> Should we consider backing Bug 1322874 out, or is it important for other things in 57?

We should probably back it out.  The issue is here is that we no longer inherit the charset of the baseURI.
We can fix _this_ issue in XMLHttpRequestMainThread.cpp [1] by passing the charset of the document into NS_NewURI.
But there might be other cases where the charset causes a problem.

[1] http://searchfox.org/mozilla-central/rev/31606bbabc50b08895d843b9f5f3da938ccdfbbf/dom/xhr/XMLHttpRequestMainThread.cpp#1568
Flags: needinfo?(valentin.gosu)
(In reply to Valentin Gosu [:valentin] from comment #11)
> We should probably back it out.  The issue is here is that we no longer
> inherit the charset of the baseURI.
> We can fix _this_ issue in XMLHttpRequestMainThread.cpp [1] by passing the
> charset of the document into NS_NewURI.
> But there might be other cases where the charset causes a problem.

If we backout, we should update the URL spec to introduce the charset concept because that "problem" is expected per the current spec (e.g. fetch incompatibility). I'm not sure we should do it unless more actual web compat problems are found.
Flags: needinfo?(VYV03354)
If other browsers don't use utf-8 there, is there a reason we want to take the compatibility risk in 57?
emk, the URL Standard does have an encoding concept: https://url.spec.whatwg.org/#url-parsing. Using that concept for XMLHttpRequest is https://github.com/whatwg/xhr/issues/159 as I pointed out above. (It's also used throughout the HTML Standard quite a bit, albeit indirectly.)

I filed bugs against other browsers not using UTF-8 for fetch(). Chrome has already fixed their bug on that: https://bugs.chromium.org/p/chromium/issues/detail?id=772390.
(In reply to Anne (:annevk) from comment #14)
> emk, the URL Standard does have an encoding concept:
> https://url.spec.whatwg.org/#url-parsing. Using that concept for
> XMLHttpRequest is https://github.com/whatwg/xhr/issues/159 as I pointed out
> above. (It's also used throughout the HTML Standard quite a bit, albeit
> indirectly.)
> 
> I filed bugs against other browsers not using UTF-8 for fetch(). Chrome has
> already fixed their bug on that:
> https://bugs.chromium.org/p/chromium/issues/detail?id=772390.

Yeah, therefore I support a fix that is limited to XHR rather than a full backout that will introduce an incorrect (non-UTF-8) behavior to fetch.
(In reply to Masatoshi Kimura [:emk] from comment #15)
> (In reply to Anne (:annevk) from comment #14)
> > emk, the URL Standard does have an encoding concept:
> > https://url.spec.whatwg.org/#url-parsing. Using that concept for
> > XMLHttpRequest is https://github.com/whatwg/xhr/issues/159 as I pointed out
> > above. (It's also used throughout the HTML Standard quite a bit, albeit
> > indirectly.)
> > 
> > I filed bugs against other browsers not using UTF-8 for fetch(). Chrome has
> > already fixed their bug on that:
> > https://bugs.chromium.org/p/chromium/issues/detail?id=772390.
> 
> Yeah, therefore I support a fix that is limited to XHR rather than a full
> backout that will introduce an incorrect (non-UTF-8) behavior to fetch.

Are you able to take on such a targetted XHR fix, Thomas?
Flags: needinfo?(twisniewski)
I believe so. Have we decided to block 57 on this?
Flags: needinfo?(twisniewski) → needinfo?(overholt)
(In reply to Thomas Wisniewski from comment #17)
> I believe so. Have we decided to block 57 on this?

Awesome! Yes, we did. Blame miket.
Flags: needinfo?(overholt)
Anne, I have the obvious fix ready (the one you suggested in comment 4), but there's still a compat problem: for the "lone surrogate" test in test_url_encoding.html, we would now emit this: "&%2365533;" while Chrome/Webkit emit "%26%2355357%3B" and Edge emits "?". So who is correct? Should we also be escaping the ampersand and semicolon?
Flags: needinfo?(annevk)
Since it may be worth landing my partial fix (even if the "lone surrogate" test is still failing), I'm doing a try run for what I have so far: https://treeherder.mozilla.org/#/jobs?repo=try&revision=d0fa3fc699e27ee9ed8b4a33121de1f4afc49636
Alright, since the only worrying test failure in that try run isn't actually related (the wpt10 WebRTC failure; it fails locally for me with or without my patch), I think the patch is ready for a review (and a decision on whether we need to fix remaining issue I pointed out in comment 19, or whether that is better-addressed in a follow-up).
(In reply to Thomas Wisniewski from comment #19)
> Anne, I have the obvious fix ready (the one you suggested in comment 4), but
> there's still a compat problem: for the "lone surrogate" test in
> test_url_encoding.html, we would now emit this: "&%2365533;" while
> Chrome/Webkit emit "%26%2355357%3B" and Edge emits "?". So who is correct?
> Should we also be escaping the ampersand and semicolon?

Please file a separate bug. It does not have to do with this encoding problem.
Comment on attachment 8919921 [details]
Bug 1405696 - pass the document encoding to NS_NewURI for XMLHttpRequest;

https://reviewboard.mozilla.org/r/190856/#review196352

::: dom/xhr/XMLHttpRequestMainThread.cpp:1554
(Diff revision 1)
>      baseURI = mBaseURI;
>    } else if (responsibleDocument) {
>      baseURI = responsibleDocument->GetBaseURI();
>    }
> +
> +  mozilla::NotNull<const Encoding*> originCharset = UTF_8_ENCODING;

nit: I believe this code is already in the mozilla namespace, so I think you can just write `NotNull<>`.

::: testing/web-platform/meta/XMLHttpRequest/open-url-encoding.htm.ini:4
(Diff revision 1)
> +[open-url-encoding.htm]
> +  type: testharness
> +  [lone surrogate]
> +    expected: FAIL

Can you add a `bug: <link>` line here to reference the follow-up bug to investigate this?

::: testing/web-platform/tests/XMLHttpRequest/open-url-encoding.htm:23
(Diff revision 1)
>        }, "percent encode characters");
>        test(function() {
>          var client = new XMLHttpRequest()
>          client.open("GET", "resources/content.py?\uD83D", false)
>          client.send()
> -        assert_equals(client.getResponseHeader("x-request-query"), "%EF%BF%BD")
> +        assert_equals(client.getResponseHeader("x-request-query"), "%26%2355357%3B")

I can't claim to be an expert on encodings, so I'm trusting these values are correct.
Attachment #8919921 - Flags: review?(bkelly) → review+
I've created the follow-up bug, so this is ready for check-in.
Keywords: checkin-needed
Assignee: nobody → wisniewskit
Pushed by ryanvm@gmail.com:
https://hg.mozilla.org/integration/autoland/rev/4f95cfe243f3
pass the document encoding to NS_NewURI for XMLHttpRequest; r=bkelly
Keywords: checkin-needed
I honestly have no clue what could make this patch leaky (and why other code that does the same elsewhere, like in the HTML parser, isn't leaky).

bkelly, emk, would you know who could help me find out?
Flags: needinfo?(wisniewskit)
Flags: needinfo?(bkelly)
Flags: needinfo?(VYV03354)
I guess the leak is a false positive, because Linux64 asan bc8 did not fail in the next changeset:
https://treeherder.mozilla.org/#/jobs?repo=autoland&revision=faf8d2ab8dcaff4d78eb722c94ba7727ea20f67b

Servo servo PR #18809 would be the actual cause.
Flags: needinfo?(VYV03354)
Ah, that makes sense, thanks emk.

aryx, I can't imagine those wpt10 failures being from my patch, as I get the same failures in local tests whether my patch is applied or not. Based on bug 1404733, they have been happening for a while. I also can't see why my patch would cause them to fail more often, because the traces I see in the log have nothing to do with XMLHttpRequest (as far as I can tell).
Flags: needinfo?(bkelly)
Thomas, per the algorithm in https://url.spec.whatwg.org/#query-state Firefox is correct (&%2365533;). However, judging from Ben's review comments it seems you aligned the test with Chrome/Safari?
Flags: needinfo?(annevk)
Yes, I did. But since Firefox is correct I will update my patch tomorrow and re-request check-in with the correct test expectation (and I'll close bug 1410139, since we're doing the right thing).
Okay, I've updated the patch's test expectation to reflect that we're correct in the "lone surrogate" case.

Since it doesn't seem that this patch is the one causing any of the failures we saw, I'm re-requesting checkin.
Keywords: checkin-needed
Pushed by ryanvm@gmail.com:
https://hg.mozilla.org/integration/autoland/rev/2dd370ac9599
pass the document encoding to NS_NewURI for XMLHttpRequest; r=bkelly
Keywords: checkin-needed
jgraham, do you have any idea why my patch would be triggering bug 1404733, or do you think it's not related? I can't quite tell what the problem is causing those WebRTC failures, nor why it would be affected by my XMLHttpRequest patch.
Flags: needinfo?(wisniewskit) → needinfo?(james)
bug 1404733 is not related. If you click on the failure and then on 'Similar Jobs' at the bottom followed by a click on a green job, the panel on the right will also show these "failures" - they are not fatal.
Ah, thanks aryx, I see it now.

The problem is that I should not be using the responsible document's encoding for dedicated workers, but rather I should just fall back on UTF-8 (a trivial change to the patch).

I'll do one more try-run before I try landing again.
Flags: needinfo?(james)
Alright, based on this try-run I think we're good to go this time: https://treeherder.mozilla.org/#/jobs?repo=try&revision=1d96616db5a9f8e7b9a4857d5ffbcb8da7a1b470

Re-requesting check-in.
Keywords: checkin-needed
Pushed by ryanvm@gmail.com:
https://hg.mozilla.org/integration/autoland/rev/cb92520aacde
pass the document encoding to NS_NewURI for XMLHttpRequest; r=bkelly
Keywords: checkin-needed
https://hg.mozilla.org/mozilla-central/rev/cb92520aacde
Status: NEW → RESOLVED
Closed: 7 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla58
New regression in 57 that includes a test - please request Beta approval on this when you're comfortable doing so.
Flags: needinfo?(wisniewskit)
Flags: in-testsuite+
Yang, could you confirm if the site if now working properly for you again in the latest nightly builds? Thanks!
Flags: needinfo?(wisniewskit) → needinfo?(yfdyh000)
LGTM.
Status: RESOLVED → VERIFIED
Flags: needinfo?(yfdyh000)
Comment on attachment 8919921 [details]
Bug 1405696 - pass the document encoding to NS_NewURI for XMLHttpRequest;

Approval Request Comment
[Feature/Bug causing the regression]: regressing bug #1322874 (inadvertently changed the URL-parsing behavior of XMLHttpRequest slightly, breaking web-compat).
[User impact if declined]: one Chinese government site is confirmed to be affected by this issue so far (bug 1405696), breaking the site's search functionality. It is presumed that other sites will also be affected.
[Is this code covered by automated tests?]: yes, by the web platform tests (updated by this patch).
[Has the fix been verified in Nightly?]: yes, by myself and the reporter.
[Needs manual test from QE? If yes, steps to reproduce]: no.
[List of other uplifts needed for the feature/fix]: none.
[Is the change risky?]: low risk.
[Why is the change risky/not risky?]: low because it reverts XMLHttpRequest behavior back to what it was before the regressing bug broke compatibility.
[String changes made/needed]: none.
Attachment #8919921 - Flags: approval-mozilla-beta?
Comment on attachment 8919921 [details]
Bug 1405696 - pass the document encoding to NS_NewURI for XMLHttpRequest;

Recent regression, must fix, beta57+
Attachment #8919921 - Flags: approval-mozilla-beta? → approval-mozilla-beta+
(In reply to Thomas Wisniewski from comment #50)
> [Is this code covered by automated tests?]: yes, by the web platform tests
> (updated by this patch).
> [Has the fix been verified in Nightly?]: yes, by myself and the reporter.
> [Needs manual test from QE? If yes, steps to reproduce]: no.

Setting qe-verify- based on Thomas's assessment on manual testing needs and the fact that this fix has automated coverage.
Flags: qe-verify-
Component: DOM → DOM: Core & HTML
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: