Closed Bug 1676331 Opened 4 years ago Closed 4 years ago

Request fails with Windows Authentication and bigger payload

Categories

(Core :: Networking, defect, P2)

Firefox 81
defect

Tracking

()

RESOLVED FIXED
88 Branch
Tracking Status
firefox-esr78 --- wontfix
firefox86 --- wontfix
firefox87 --- wontfix
firefox88 --- fixed

People

(Reporter: martin.gubis, Assigned: valentin)

References

(Regression)

Details

(Keywords: regression, Whiteboard: [necko-triaged])

Attachments

(8 files)

User Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:81.0) Gecko/20100101 Firefox/81.0

Steps to reproduce:

  1. Configure Windows Authentication site hosted on Windows Server 2012 R2 on IIS8
  2. Create a page, which creates hidden field with lot of content (1 MB, it was aspx viewstate in my case)
  3. Load the page, authenticate properly and wait at least 2 minutes (the time is important - if I do the postback sooner, it works)
  4. Do the postback on the page

Actual results:

Postback ends with error page "connection resetted".

Expected results:

Postback should successfully reach the server.

Above mentioned setup works properly on Chrome and also when the Firefox uses Fiddler as the proxy.

Bugbug thinks this bug should belong to this component, but please revert this change in case of error.

Component: Untriaged → Networking
Product: Firefox → Core

Can you make a http log:
https://developer.mozilla.org/en-US/docs/Mozilla/Debugging/HTTP_logging

please use a clean profile, because logs may contain cookies.

Flags: needinfo?(martin.gubis)

Please find the log on the MMIT cloud (it was too big to be attached, it will be available until 14th of December, 2020): https://cloud.moenkemoeller.it/index.php/s/QCMzqjqw5C7iAjE

Please let me know if there is anything else I can be helpful with.

Flags: needinfo?(martin.gubis)

The connection is reset by the server. Do you have a server log to look for reasons why the connection is reset.

Anyway, probably this is not firefox issue. Maybe the server log will give use some clue.

Flags: needinfo?(martin.gubis)
Attached image Firefox - smaller payload.png (deleted) —
Attached image Chrome - bigger payload.png (deleted) —
Attached image Firefox - bigger payload.png (deleted) —
Attached file (deleted) —

I have done some Wiresharking on the line between client and server. The problem seems to be somehow related to Firefox - the test works without problem on Chrome, only Firefox runs into connection reset.

First I have checked the Firefox access with smaller payload of the POST (Firefox - smaller payload.png). The actual post communication starts with No. 135, Firefox receives 401 several times (actual requests are not decoded well by Wireshark, that's the reason why they are not visible). Anyway, Firefox behaves correctly and receives 200 at the end.

Then I have checked the Chrome access with big payload of the POST (Chrome - bigger payload.png). After couple of 401s, Chrome receives correctly 200.

Then I have checked the Firefox access with big payload of the POST (Firefox - bigger payload.png). Communication stops prematurely after receiving 401 from the server. As you can see from Chrome, client is supposed to continue communication. I have drilled down to see what is actually happening in the point when the communication stops.

Drilldown of the communication end is in the "Firefox - Bigger payload - EOC drilldown.png". The actual communication is immediately preceded by the "401 Unauthorized, NTLMSSP_CHALLENGE" (so what you see in the drilldown is actually Firefox response to server). Firefox sends the POST request (packet No. 2594), but it seems, that the actual body is missing and is never sent by Firefox (the contents of the packet No. 2594 are attached in the "Firefox - Last sent request.txt"). The advertised Content-Length: 3000329 is never sent over the line. Server just acknowledges packet receive (TCP level) - Packet number 2595 and then waits for body to come. It never comes and server terminates connection (packet #2612) after the timeout.

This seems to be Firefox related problem - I think the body should be sent in the last request or it shouldn't be advertised (I am no expert in HTTP, so I am not sure whether this is possible at all).

Please let me know if you need more detailed information or Wireshark dumps. Thanks for looking into the issue.

Flags: needinfo?(martin.gubis)

Also, would it be possible to make 'Firefox - Last sent request.txt' private attachment? Or could it be deleted?

I am not sure whether the information in there are not sensitive. Thank you.

Attachment #9188498 - Attachment is private: true

(In reply to Martin Gubis from comment #11)

Also, would it be possible to make 'Firefox - Last sent request.txt' private attachment? Or could it be deleted?

I am not sure whether the information in there are not sensitive. Thank you.

I've made it a private attachment.

In addition, could you try to make a http log as said in comment #2?
Note that the log should not have the content of authorization header, but could have cookie headers. If you want, you could send the log to my email address directly.
Thanks.

Flags: needinfo?(martin.gubis)

Thanks for the feedback. Http log was already created - please find it in comment #3 (I was not able to attach it since it is too big).

Flags: needinfo?(martin.gubis)

So, I can't find anything wrong from the log.
As said in comment #4, we need the information from the server side to tell us what's wrong.

Martin, could you try to figure out why the server reset the connection?

Flags: needinfo?(martin.gubis)

I have already done the analysis, please see complete analysis in comment #10.

To summarize:

The analysis was done on the TCP level using Wireshark. I am no expert in HTTP protocol, but it seems, that the Firefox simply doesn't send the whole request (please see Firefox - Last sent request.txt - it contains contents sent to the server - actually last packet sent to the server). Firefox advertises 3MB content length, but never sends the content. Server waits for content to arrive for 120s - nothing is sent from the Firefox, so it resets the connection.

Flags: needinfo?(martin.gubis)

The severity field is not set for this bug.
:valentin, could you have a look please?

For more information, please visit auto_nag documentation.

Flags: needinfo?(valentin.gosu)

I just took a look at the logs. Indeed, there seems to be a problem in Firefox. Do you happen to know if this worked with older versions of Firefox?
It seems we send the request, upload the entire body, but then fail to send the entire body when the auth retry happens:

2020-11-13 08:59:14.299000 UTC - [Parent 11488: Main Thread]: D/nsHttp sending progress and status notification [this=00000218801CD000 status=804b000a progress=0/0]
  804b000a = STATUS_WAITING_FOR
2020-11-13 08:59:14.300000 UTC - [Parent 11488: Main Thread]: D/nsHttp nsHttpChannel::OnStartRequest [this=00000218801CD000 request=000002187C9E7D00 status=0]
2020-11-13 08:59:14.300000 UTC - [Parent 11488: Main Thread]: D/nsHttp nsHttpChannel::ProcessResponse [this=00000218801CD000 httpStatus=401]
2020-11-13 08:59:14.301000 UTC - [Parent 11488: Main Thread]: D/nsHttp nsHttpChannel::ContinueProcessResponse3 [this=00000218801CD000, rv=804b0001]
2020-11-13 08:59:14.301000 UTC - [Parent 11488: Main Thread]: D/nsHttp nsHttpChannelAuthProvider::ProcessAuthentication [this=000002187F28D790 channel=00000218801CD750 code=401 SSLConnectFailed=0]
nsHttpChannel @218801cd000 --> nsHttpChannelAuthProvider @2187f28d790
2020-11-13 08:59:14.301000 UTC - [Parent 11488: Main Thread]: D/nsHttp nsHttpChannelAuthProvider::PrepareForAuthentication [this=000002187F28D790 channel=00000218801CD750]
2020-11-13 08:59:14.301000 UTC - [Parent 11488: Main Thread]: D/nsHttp nsHttpChannelAuthProvider::GetAuthenticator [this=000002187F28D790 channel=00000218801CD750]
2020-11-13 08:59:14.301000 UTC - [Parent 11488: Main Thread]: D/nsHttp nsHttpChannelAuthProvider::GetCredentialsForChallenge [this=000002187F28D790 channel=00000218801CD750 proxyAuth=0 challenges=Negotiate]
2020-11-13 08:59:14.301000 UTC - [Parent 11488: Main Thread]: D/nsHttp nsHttpChannelAuthProvider::GetIdentityFromURI [this=000002187F28D790 channel=00000218801CD750]
2020-11-13 08:59:14.301000 UTC - [Parent 11488: Main Thread]: D/nsHttp nsHttpChannelAuthProvider::GetAuthenticator [this=000002187F28D790 channel=00000218801CD750]
2020-11-13 08:59:14.301000 UTC - [Parent 11488: Main Thread]: D/nsHttp nsHttpChannelAuthProvider::GetCredentialsForChallenge [this=000002187F28D790 channel=00000218801CD750 proxyAuth=0 challenges=NTLM]
2020-11-13 08:59:14.301000 UTC - [Parent 11488: Main Thread]: D/nsHttp nsHttpChannelAuthProvider::GetIdentityFromURI [this=000002187F28D790 channel=00000218801CD750]
2020-11-13 08:59:14.301000 UTC - [Parent 11488: Main Thread]: D/nsHttp nsHttpChannel::ConnectionRestartable this=00000218801CD000, restartable=1
2020-11-13 08:59:14.301000 UTC - [Parent 11488: Main Thread]: D/nsHttp nsHttpChannel::OnDataAvailable [this=00000218801CD000 request=000002187C9E7D00 offset=0 count=1293]
2020-11-13 08:59:14.301000 UTC - [Parent 11488: Main Thread]: D/nsHttp nsHttpChannel::OnStopRequest [this=00000218801CD000 request=000002187C9E7D00 status=0]
2020-11-13 08:59:14.301000 UTC - [Parent 11488: Main Thread]: D/nsHttp OnStopRequest 00000218801CD000 requestFromCache: 0 mFirstResponseSource: 0
2020-11-13 08:59:14.301000 UTC - [Parent 11488: Main Thread]: D/nsHttp nsHttpChannel 00000218801CD000 has a strongly framed transaction: 1
2020-11-13 08:59:14.302000 UTC - [Parent 11488: Main Thread]: D/nsHttp nsHttpChannel::DoAuthRetry [this=00000218801CD000, aTransWithStickyConn=0000021878C2A810]
2020-11-13 08:59:14.302000 UTC - [Parent 11488: Main Thread]: V/nsHttp HttpBaseChannel::SetRequestHeader [this=00000218801CD000 header="Cookie" value="ASP.NET_SessionId=ca4gv4je2j3ho3yzqwel4n31" merge=0]
2020-11-13 08:59:14.302000 UTC - [Parent 11488: Main Thread]: D/nsHttp nsHttpChannel::ContinueDoAuthRetry [this=00000218801CD000]
2020-11-13 08:59:14.302000 UTC - [Parent 11488: Main Thread]: D/nsHttp nsHttpChannel::DoConnect [this=00000218801CD000]
2020-11-13 08:59:14.302000 UTC - [Parent 11488: Main Thread]: D/nsHttp nsHttpChannel::DoConnectActual [this=00000218801CD000, aTransWithStickyConn=0000021878C2A810]
2020-11-13 08:59:14.302000 UTC - [Parent 11488: Main Thread]: D/nsHttp nsHttpChannel::SetupTransaction [this=00000218801CD000, cos=64, prio=-20]
2020-11-13 08:59:14.302000 UTC - [Parent 11488: Main Thread]: E/nsHttp nsHttpChannel 00000218801CD000 created nsHttpTransaction 000002187A809810
nsHttpChannel @218801cd000 --> nsHttpTransaction @2187a809800
2020-11-13 08:59:14.302000 UTC - [Parent 11488: Main Thread]: D/nsHttp nsHttpChannel::ContinueOnStopRequestAfterAuthRetry [this=00000218801CD000, aStatus=0 aAuthRetry=1, aIsFromNet=1, aTransWithStickyConn=0000021878C2A810]
2020-11-13 08:59:14.303000 UTC - [Parent 11488: Main Thread]: D/nsHttp sending progress and status notification [this=00000218801CD000 status=804b0005 progress=693/3001014]
  804b0005 = STATUS_SENDING_TO
2020-11-13 08:59:14.303000 UTC - [Parent 11488: Main Thread]: D/nsHttp sending progress and status notification [this=00000218801CD000 status=804b000a progress=0/0]
  804b000a = STATUS_WAITING_FOR

Hopefully this is an HTTP bug, not an NTLM bug.
Provisionally taking the bug to have a look.

Assignee: nobody → valentin.gosu
Severity: -- → S3
Status: UNCONFIRMED → NEW
Ever confirmed: true
Flags: needinfo?(valentin.gosu)
Priority: -- → P2
Whiteboard: [necko-triaged]

Martin, do you know if sending a smaller payload makes the use case work?
The answer is useful for tracking down the cause of the bug.

Flags: needinfo?(martin.gubis)

Valentin, smaller payload works correctly. Problem surfaces only when there is a bigger payload. Should I check any specific payload size?

Flags: needinfo?(martin.gubis)

I didn't check with older version of Firefox. I suspect, that Firefox ESR 78 has the similar behaviour (problem was originally reported on the customer's side where I wasn't able to do proper analysis), but I didn't verify it. Is there any special version I should check against?

It seems we do attempt to rewind the upload stream and send all of it, but maybe that fails for some reason?

It would be useful to know what the upload is. Is it a file? Just a bunch of text you embed into the form?

(In reply to Martin Gubis from comment #20)

I didn't check with older version of Firefox. I suspect, that Firefox ESR 78 has the similar behaviour (problem was originally reported on the customer's side where I wasn't able to do proper analysis), but I didn't verify it. Is there any special version I should check against?

There's the mozregression tool you can use to find a range:
https://mozilla.github.io/mozregression/

I would for something in the past 2-3 years - mozregression -g 2018-01-01 -M classic
The -M classic option would first check if the build on that date is good or bad before starting the bisection. If that fails you can go backwards by a few years, but I doubt it will start working 🙂

First phase of bisection

Valentin, I have successfully done bisection on nightly builds (I have done it in two phases - both are attached). The problem is regression, which happened in the commits on 23rd of October 2019 (version from 22nd of October 2019 works well and version from 23rd of October 2019 fails).

Also, if you have a look on the bisection report, the version from 26th of October 2019 crashed during the postback (I had to skip it).

The most of the payload (like 99%) is one big input (hidden). The problem doesn't surface when the big file is part of the payload - it seems, that only the big FormData causes trouble (but I didn't test this thoroughly, so I might be wrong with this - I don't have a proper test setup for this scenario).

The customer (big German enterprise) is using Firefox ESR 78.5.0 (32-bit). It would be great to have this regression fixed also in this ESR release. They are relying heavily on the Windows Authentication and are facing issues with apps, which have big postback payloads. Can you arrange it to have the fix also in the ESR 78 or should I create separate ticket for it?

Thank you for doing the regression. Future note, it's better to post the output rather than a screenshot 🙂 I eventually managed to extract the link from the photo.
https://hg.mozilla.org/mozilla-central/pushloghtml?fromchange=47d7c18620c845c3a42&tochange=f915da42169e70bdd67

I suspect the culprit might be Bug 1583700, but I can't be sure.
Could you help us further by testing with the build here:
https://archive.mozilla.org/pub/firefox/nightly/2019/10/2019-10-25-16-40-00-mozilla-central/
Check if setting the browser.tabs.documentchannel pref to false in about:config makes the bug go away.

Thanks!

Flags: needinfo?(martin.gubis)

Thanks for the feedback. I have tested the version from the repository (I didn't know how to run it in portable mode, so I have installed it to virtual machine - it was reporting, that the update is ready, but I made sure, that about showed 72.0a.1 nightly version when I was doing the tests):

  1. Testing with default config (fresh install): didn't work
  2. Had to reinstall (updates got installed during Firefox restart) - made sure I am on 72.0a.1 after reinstall
  3. Switching browser.tabs.documentchannel to false: works correctly

Would be the disabling of the document channel valid workaround for customer? Does it anyhow influence browsing experience on other pages?

Thanks.

Flags: needinfo?(martin.gubis)

I don't think the pref is available in esr78.

Matt, I'm trying to understand what's going on here, but without access to a test setup I'm only speculating.
Can you think of any interaction between document channel and the upload stream that would cause us to fail when retrying?
I suspect that rewinding the upload stream fails for some reason, but I can't say why it only happens with the document channel, and only with a large payload.

Flags: needinfo?(matt.woodrow)
Keywords: regression
Regressed by: 1583700
Has Regression Range: --- → yes

One difference is that HttpChannelChild uses AutoIPCStream directly whereas DocumentChannel uses the IPDLParamTraits helpers for nsIInputStream*. That's caused issues before in bug 1595578, but I'm not aware of any remaining problems.

Iirc for larger payloads we switch to using an actor for blob serialization, rather than just streaming the bytes across inline, so that might be the difference as to when this becomes an issue.

Flags: needinfo?(matt.woodrow)

Thanks for the info. It may be the blob serialization that is causing issues here 🤔
Andrea, are there any known issues with rewinding a serialized (blob) input stream?

Flags: needinfo?(amarchesini)

I suspect that the code doesn't react properly to the situation when the mUploadStream doesn't implement ISeekableStream interface. It just continues silently without having the upload stream rewinded. Does the instance of mUploadStream used with big payload implement this interface?

Is there anything new about the topic?

Is there anything new about the topic? If Mozilla doesn't plan to fix this issue (or look into it), please mark this as won't fix so I can recommend to the customer (big German enterprise) to use Chrome instead.

Summary of the issue:

Content page is creating a blob larger that 256Kb in size and posting it.
In the parent process, we do an authRetry after the authentication completes so we try to rewind the stream and retry the request.
Due to the fact that the deserialized stream does not implement nsISeekableStream we fail to rewind the stream.
The channel will hang, because we attempt to read from the stream and there is no more data there.

To improve user experience my approach here is to fail the channel if we fail to rewind the streams.
This will cause the load to fail we stay on the original page and the user can submit again if necessary.

In a follow-up bug we can investigate whether it's possible to make this work. Introducing an API that allows us to rewind the stream asynchronously might be a solution, but it's not that easy to implement.

Blocks: 1696386
Attachment #9206649 - Attachment description: Bug 1676331 - Assert that we can rewind the upload stream in nsHttpChannel::ContinueDoAuthRetry r=#necko → Bug 1676331 - Fail request in nsHttpChannel::ContinueDoAuthRetry if rewinding the upload stream fails r=#necko
Attachment #9206650 - Attachment description: Bug 1676331 - Test that submitting a form with a large blob works r=baku → Bug 1676331 - Test that submitting a form with a large blob works with authentication r=baku,#necko
Pushed by valentin.gosu@gmail.com: https://hg.mozilla.org/integration/autoland/rev/c8321424eb47 Fail request in nsHttpChannel::ContinueDoAuthRetry if rewinding the upload stream fails r=necko-reviewers,dragana https://hg.mozilla.org/integration/autoland/rev/5fabd9f4bb61 Test that submitting a form with a large blob works with authentication r=necko-reviewers,dragana
Regressions: 1696894
Status: NEW → RESOLVED
Closed: 4 years ago
Resolution: --- → FIXED
Target Milestone: --- → 88 Branch
Flags: needinfo?(amarchesini) → in-testsuite+
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: