Closed Bug 1725996 Opened 3 years ago Closed 3 years ago

Enhanced Tracking Protection does not forward cookies in iframes that go to the same domain

Categories

(Core :: Privacy: Anti-Tracking, defect, P2)

Firefox 90
defect

Tracking

()

RESOLVED FIXED
95 Branch
Tracking Status
firefox-esr78 --- unaffected
firefox-esr91 --- wontfix
firefox91 --- wontfix
firefox92 --- wontfix
firefox93 --- wontfix
firefox94 --- wontfix
firefox95 --- fixed

People

(Reporter: balloob, Assigned: timhuang)

References

(Blocks 1 open bug, Regression)

Details

(Keywords: regression)

Attachments

(3 files)

User Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:90.0) Gecko/20100101 Firefox/90.0

Steps to reproduce:

In Home Assistant (https://www.home-assistant.io/) we have an ingress features. It proxies all requests for an application via Home Assistant and is protected by Home Assistant authentication. This got broken in a recent update to Firefox when a user has set Enhanced Site Protection to strict.

The feature works as follows:

  • A user is visiting his Home Assistant dashboard on http://homeassistant.local:8123
  • A user navigates to a page that is served via Ingress
  • The Home Assistant UI will create a session and store it as a cookie
  • Home Assistant UI creates an iframe pointed at the same domain at /api/hassio_ingress/<ingress ID> and shows it to the user

The cookie set code is here:

https://github.com/home-assistant/frontend/blob/dev/src/data/hassio/ingress.ts#L7

  document.cookie = `ingress_session=${session};path=/api/hassio_ingress/;SameSite=Strict${
    location.protocol === "https:" ? ";Secure" : ""
  }`;

Actual results:

If the user has Enhanced Site Protecetion set to strict, they are shown a 401 unauthorized because the cookie is not sent alongside with the iframe

Here are the request headers:

GET /api/hassio_ingress/<IDENTIFIER OF INGRESS>/ HTTP/1.1
Host: homeassistant.local:8123
User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:90.0) Gecko/20100101 Firefox/90.0
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8
Accept-Language: en-US,en;q=0.5
Accept-Encoding: gzip, deflate
DNT: 1
Connection: keep-alive
Referer: http://homeassistant.local:8123/a0d7b954_esphome
Upgrade-Insecure-Requests: 1
Pragma: no-cache
Cache-Control: no-cache

The response headers

HTTP/1.1 401 Unauthorized
Date: Mon, 16 Aug 2021 16:47:38 GMT
Server: Python/3.9 aiohttp/3.7.4.post0
Content-Type: text/plain
Content-Length: 17

If I go to dev tools -> storage -> cookies I do see the cookie is created. It's just not send along

Cookie info with Enhanced Site Protection turned off (and so working):

Domain: homeassistant.local
Expires / Max-Age: Session
HostOnly: true
HttpOnly: false
Path: /api/hassio_ingress/
SameSite: "Strict"
Secure: false

Cookie info with Enhanced Site Protection turned off (no difference):

Domain: homeassistant.local
Expires / Max-Age: "Session"
HostOnly: true
HttpOnly: false
Path: "/api/hassio_ingress/"
SameSite: "Strict"
Secure: false

Expected results:

The cookie should be sent with the iframe request.

The Bugbug bot thinks this bug should belong to the 'Core::Networking: Cookies' component, and is moving the bug to that component. Please revert this change in case you think the bot is wrong.

Component: Untriaged → Networking: Cookies
Product: Firefox → Core
Component: Networking: Cookies → Privacy: Anti-Tracking

I can reproduce with on Ubuntu 21.04, Nightly 93.0a1 (2021-08-18).

STR:

  1. Start an instance of homeassistant (I've used their KVM image).
  2. Go through initial setup.
  3. Install the Visual Studio Code addon (http://homeassistant.local:8123/hassio/addon/a0d7b954_vscode/info).
  4. Once installed, click start, then "Open web UI" on the right next to the installation button.
  5. Home assistant navigates to http://homeassistant.local:8123/hassio/ingress/a0d7b954_vscode

Expected:
The visual studio code editor loads.

Actual:
The vscode editor does not load, the iframe remains blank. In the network inspector we can see a 401 for the /api/hassio_ingress/... route.

mozregression points to the patch stack of Bug 1700623: https://hg.mozilla.org/integration/autoland/pushloghtml?fromchange=5a1efb6ae9b81a1f51ba9cf5182f92c1c5644fd0&tochange=51ac7ee1f56d0aeb3a16291d3f18d33bdaa8cc66

Status: UNCONFIRMED → NEW
Ever confirmed: true

Since this is most likely a regression of Bug 1700623, :farre, could you take a look?

Flags: needinfo?(afarre)
Regressed by: 1700623
Has Regression Range: --- → yes

Looking at the cookies, somehow we set a partition key, even though the cookie is set in the top level via document.cookie.
I've shared a screenshot on Matrix: https://matrix.to/#/!qCoDzuUtTabyiUltYt:mozilla.org/$rPbD_HK8iBi_gvxg87NL1vcJVv0B3TVaognSgN7Kw9M?via=mozilla.org

Severity: -- → S3
Priority: -- → P2

I'm looking into this.

Assignee: nobody → afarre
Flags: needinfo?(afarre)

Paul, I tried to follow the steps from https://www.home-assistant.io/installation/linux, but I wasn't able to install it in virt-manager. Did you just follow those steps?

Flags: needinfo?(pbz)

Yes I've set it up with virt-manager, specifically according to the instructions here: https://www.home-assistant.io/installation/linux#hypervisor-specific-configuration (KVM tab). You don't have to setup the bridge interface for the POC. NAT should be fine.

Feel free to reach out on Slack or Matrix, if it's helpful.

Flags: needinfo?(pbz)
Status: NEW → ASSIGNED

Hi :farre, are you still looking into this?

Flags: needinfo?(afarre)

I've still not been able to reproduce this. I'm having a hard time using KVM with the image. I'll have another go this week. Do you know of another way of reproducing this issue?

Flags: needinfo?(afarre) → needinfo?(pbz)

So far I wasn't able to reproduce the issue elsewhere. Perhaps you could try the VirtualBox or VMWare image?
There is clearly an issue with dFPI. Setting cookie behavior to 4 fixes the issue. We somehow seem to be partitioning a cookie set on the top level. Here is the cookie:

{
  "name": "ingress_session",
  "value": "<sessionId>",
  "isDomain": false,
  "host": "<endpoint>.ui.nabu.casa",
  "rawHost": "<endpoint>.ui.nabu.casa",
  "path": "/api/hassio_ingress/",
  "isSecure": true,
  "expires": 0,
  "expiry": 9223372036854776000,
  "originAttributes": {
    "firstPartyDomain": "",
    "geckoViewSessionContextId": "",
    "inIsolatedMozBrowser": false,
    "partitionKey": "(https,<endpoint>.ui.nabu.casa)",
    "privateBrowsingId": 0,
    "userContextId": 0
  },
  "isSession": true,
  "isHttpOnly": false,
  "creationTime": 1633548081466903,
  "lastAccessed": 1633548082110773,
  "sameSite": 2,
  "schemeMap": 2,
  "SAMESITE_NONE": 0,
  "SAMESITE_LAX": 1,
  "SAMESITE_STRICT": 2,
  "SCHEME_UNSET": 0,
  "SCHEME_HTTP": 1,
  "SCHEME_HTTPS": 2,
  "SCHEME_FILE": 4
}

Note that endpoint is always the same and matches the origin loaded on the top level.
When disabling dFPI the partitionKey remains empty and the cookie is sent to the server.

Flags: needinfo?(pbz)

Did some debugging: The storage principal in CookieCommons::CreateCookieFromDocument has the permission key which matches the top level. Not sure why yet.

So I've finally found out why this happens. When bug 1700623 landed, we introduced a very early call to Document::EffectiveStoragePrincipal(), so early that it is guaranteed to use Document::mPartitionedPrincipal as Document::mActiveStoragePrincipal. This happens because we're creating and initializing a WindowGlobalChild in WindowGlobalChild::Create and need to sync the storage principal to the parent, and I was able to verify that by just introducing:

diff --git a/dom/ipc/WindowGlobalActor.cpp b/dom/ipc/WindowGlobalActor.cpp
index 7548e03f604b..b092b4eb0ca5 100644
--- a/dom/ipc/WindowGlobalActor.cpp
+++ b/dom/ipc/WindowGlobalActor.cpp
@@ -143,6 +143,8 @@ WindowGlobalInit WindowGlobalActor::WindowInitializer(
 
   fields.mIsLocalIP = init.principal()->GetIsLocalIpAddress();
 
+  aWindow->GetEffectiveStoragePrincipal();
+
   // Most data here is specific to the Document, which can change without
   // creating a new WindowGlobal. Anything new added here which fits that
   // description should also be synchronized in

in the tree before bug 1700623 landed, this starts happening.

This doesn't feel like it's the fault of the patch from bug 1700623, but from how the storage principal is cached and given access. WindowGlobalParent needs to know the storage principal at this point, and care has been taken that when we give access, the resulting change in storage principal gets synced, but if it's an error to get the storage principal too early, we're in a bit of a pickle.

:pbz, do you know how to proceed with this? I'd be happy to help out fixing this, but I'd need a few pointers to what the solution should be. Also, huge thanks for fixing so that I could reproduce!

Flags: needinfo?(pbz)

So to be perfectly clear, I'm fairly certain that at the point where we initialize the WindowGlobalChild the partitioned storage principal is the correct principal, since we need to be able to give access later. But somehow it seems that we end up loosing ourselves when this should happen.

I haven't worked that much on the principal handling in anti-tracking yet. Tim, could you advise on a strategy to address this issue?
If you want to reproduce the STR from comment 2 I can provide you with the url / credentials for my test installation.

I'm still wondering why we see this in the home assistant software specifically, i.e. how they end up with this timing.

Flags: needinfo?(pbz) → needinfo?(tihuang)

A very early call to Document::SetCookie should do it. Early enough that storage access hasn't been granted. I wonder if it wouldn't be possible to write a test case for this, I'll give it a quick try.

The issue happens because the third-party check will also fail when it's too early. And the main reason is that the current WindowContext is not available, which looks weird to me that why Docuemnt::SetCookie is called even before the WindowContext is ready.

Farre, Is this supposed to happen?

Flags: needinfo?(tihuang) → needinfo?(afarre)

Here is a quick test script that filters for potentially wrongly partitioned cookies:
https://gist.github.com/Trikolon/fd5c63b6756b83576f821b40a47b46cd
In my production Firefox profile I could only find 2 cookies which match this filter. So it doesn't seem super severe. We should still address this sooner rather than later.
Note that this may also affect other storages.

After a more closer look, it not the case where calling Document::SetCookie before WindowContext is ready, but the document caches the partitioned principal when an early access of Document::EffectiveStoragePrincipal(). In this case, the WindowContext is not available, so the third-party check fails. Hence, the later access Document::SetCookie will use the incorrect partitioned principal.

So it isn't the call to Document::SetCookie that is the problem, and you can't ever call Document::SetCookie before a WindowContext is available, since Document implies WindowContext.

The thing I was hoping to happen was that the storage access grant would happen before the call to Document::SetCookie, because then we'd have the correct storage principal, instead of the cached partitioned one. If this isn't possible, then we need to figure out what the storagePrincipal for a newly initialized WindowContext should be[1]. Tim, deciding what that should do should fix it right? Either it is not the partitioned principal, but we find out how to not use it until we've been able to update WindowGlobalParent.documentStoragePrincipal to the correct one, or we initialize it with the default value, and make sure to clear it to the correct one when granting access, and instead make sure grant access at the correct point? We might even get away with: default initialize to partitioned, then update to regular storagePrincipal (before grant), but when we know that we can call Document::EffectiveStoragePrincipal() without it being too soon. Because I tried to just remove the cache in Document::EffectiveStoragePrincipal() and then the bug goes away.

https://searchfox.org/mozilla-central/source/dom/ipc/WindowGlobalParent.cpp#125

Flags: needinfo?(afarre) → needinfo?(tihuang)

I think we should get the correct storage principal for a newly initialized Window. Given that we already have the document when we first access the Document::EffectiveStoragePrinicpal() in the early initialization stage, we should be able to get the correct StorageAccess for the document. The problem here is that we are relying on the channel of the document to tell if the document is a third party. Otherwise, we will fall back to use the ThirdPartyUtil::IsThirdPartyWindow() to check this and it will fail because the current window context is not available.

To fix this, we can directly check if a window is a third party through its browsingContext. In Bug 1731982, I will introduce an AntiTrackingUtils::IsThirdPartyContext, which is supposed to serve this purpose. And applying following patch can fix this issue.

diff --git a/toolkit/components/antitracking/AntiTrackingUtils.cpp b/toolkit/components/antitracking/AntiTrackingUtils.cpp
index 5d01ae050bfba..0e52775935fe5 100644
--- a/toolkit/components/antitracking/AntiTrackingUtils.cpp
+++ b/toolkit/components/antitracking/AntiTrackingUtils.cpp
@@ -651,7 +651,7 @@ bool AntiTrackingUtils::IsThirdPartyWindow(nsPIDOMWindowInner* aWindow,
   }

   RefPtr<Document> doc = aWindow->GetDoc();
-  if (!doc || !doc->GetChannel()) {
+  if (!doc) {
     // If we can't get channel from the window, ex, about:blank, fallback to use
     // IsThirdPartyWindow check that examine the whole hierarchy.
     nsCOMPtr<mozIThirdPartyUtil> thirdPartyUtil =
@@ -661,6 +661,10 @@ bool AntiTrackingUtils::IsThirdPartyWindow(nsPIDOMWindowInner* aWindow,
     return thirdParty;
   }

+  if (!doc->GetChannel()) {
+    return IsThirdPartyContext(doc->GetBrowsingContext());
+  }
+
   // We only care whether the channel is 3rd-party with respect to
   // the top-level.
   nsCOMPtr<nsILoadInfo> loadInfo = doc->GetChannel()->LoadInfo();
Flags: needinfo?(tihuang)

farre, would you mind that I take over this bug?

Flags: needinfo?(afarre)

No not at all! Thank you for helping me with this!

Assignee: afarre → tihuang
Flags: needinfo?(afarre)
Depends on: 1731982

Did bug 1731982 fix this for 95?

Yes, Bug 1731982 fixes this issue. I am still planning to add a test in this bug.

Flags: needinfo?(tihuang)

Thanks for the fix !

We used to use the ThirdPartyUtil::IsThirdPartyWindow() to check third
party if the document or the channel is not available. However, this
could be incorrect in the case where the channel is not available
because the WindowContext is not ready yet. To address this issue, we
use the browingContext of the document to check third party.

Pushed by tihuang@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/f41b0cceba4e Part 1: Add a test to verify setting cookies with document.write() in an iframe will have correct partitionKey. r=pbz https://hg.mozilla.org/integration/autoland/rev/518cc5289f58 Part 2: Implement AntiTrackingUtils::IsThirdPartyContext(). r=dimi https://hg.mozilla.org/integration/autoland/rev/3d8181bdd6c2 Part 3: Use AntiTrackingUtils::IsThirdPartyContext() to check third party in AntiTrackingUtils::IsThirdPartyWindow() if the channel is not available. r=pbz
Status: ASSIGNED → RESOLVED
Closed: 3 years ago
Resolution: --- → FIXED
Target Milestone: --- → 95 Branch
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: