Closed Bug 1748005 Opened 3 years ago Closed 2 years ago

Getting Uncaught DOMException: The operation is insecure while opening Websocket using 'wss' protocol

Categories

(Core :: DOM: Networking, defect, P2)

Firefox 95
defect

Tracking

()

RESOLVED FIXED
106 Branch
Tracking Status
firefox-esr91 --- wontfix
firefox-esr102 --- wontfix
firefox101 --- wontfix
firefox102 --- wontfix
firefox103 --- wontfix
firefox104 --- wontfix
firefox105 --- wontfix
firefox106 --- fixed

People

(Reporter: pauloramires, Assigned: acreskey)

References

(Regression)

Details

(Keywords: regression, Whiteboard: [necko-triaged][necko-priority-queue])

Attachments

(4 files)

User Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:95.0) Gecko/20100101 Firefox/95.0

Steps to reproduce:

Developing an App using VueJS framework and Vite.

Start a local development server (https://localhost:3000), Vite launches a Hot Module Reload server listening on wss://localhost:3000 for refreshing the page on source-code changes.

Local server running a self-sign certificate.
After launching the local development server, I add a security exception on Firefox.

Actual results:

Browsing https://localhost:3000 with Firefox, I can see the app, plus HMR works OK (page reloads on every change).

Now, if I load my local server via a third-party iframe (SaaS app, LeanIX), I get a "Uncaught DOMException: The operation is insecure." error on Websocket connect.
Vite clients try to connect to "wss://localhost:3000", but my browser url is something like "https://app.leanix.net/workspace/reports/dev/new?url=https:%2F%2Flocalhost:3000"

This setup was working with Firefox before, and still works for Chrome and Edge.

Expected results:

I should be able to load the url "https://app.leanix.net/workspace/reports/dev/new?url=https:%2F%2Flocalhost:3000" with firefox, and my vite client (embedded on the app) should be able to connect to "wss://localhost:3000"

The Bugbug bot thinks this bug should belong to the 'Firefox::Security' component, and is moving the bug to that component. Please revert this change in case you think the bot is wrong.

Component: Untriaged → Security
Attached image 02_exception_thrown_line_28.png (deleted) —
Attached image 03_chrome_loads_ok.png (deleted) —

Moving this over to a potential component.
Hi pauloramires,
Could you please try to run mozregression and visit the hosted site to try to find the regression-range for this issue?
Here is how to do that: https://mozilla.github.io/mozregression/
Just to be sure this is not caused by some addon on custom preference, could you check it out using Firefox Safe Mode (disables addons) or using a new Firefox profile?

Instructions for both of the mentioned options:
Firefox Safe Mode
https://support.mozilla.org/en-US/kb/troubleshoot-firefox-issues-using-safe-mode
New Firefox Profile
https://support.mozilla.org/en-US/kb/profile-manager-create-remove-switch-firefox-profiles

Component: Security → DOM: Service Workers
Flags: needinfo?(pauloramires)
Product: Firefox → Core

Hi Timea,
So I've launched Firefox in Safe Mode, url loaded ok.
Reloaded in normal mode, disabled installed extensions - just have zoom extension installed, page didn't load (got insecure exception).

Then switched Firefox profiles, from default-release to default, page now loads ok.

So it seems it was something related to this default-release profile.

Flags: needinfo?(pauloramires)

Timea,
Actually switching profiles did not solved the situation.
After completely closing the browser and launching again, the problem persisted in the new profile.
However, when I start Firefox in Safe Mode, the exception does not occur.

Regression seems to point the origin to build d6e8528f:

2021-12-31T09:22:59.679000: INFO : Narrowed integration regression window from [383986e2, 55b351f8] (3 builds) to [383986e2, d6e8528f] (2 builds) (~1 steps left)
2021-12-31T09:23:01.100000: DEBUG : Found commit message:
Bug 1732358 - Part 5: Add the fission rollout slug to the GRADUATION_SET, r=mythmon

Depends on D133008

Differential Revision: https://phabricator.services.mozilla.com/D133659

2021-12-31T09:23:01.100000: DEBUG : Did not find a branch, checking all integration branches
2021-12-31T09:23:01.104000: INFO : The bisection is done.
2021-12-31T09:23:01.106000: INFO : Stopped

Kershaw, since I don't see a ServiceWorker is involved, could you help me to check if it is an issue related to WebSocket?

Flags: needinfo?(kershaw)

Could you share a minimal test case to reproduce this?
Are you able to reproduce this if fission.autostart is disabled?

Flags: needinfo?(kershaw) → needinfo?(pauloramires)

Disabling fission.autostart doesn't solve the issue.
Launching firefox in Troubleshooting mode does - however I do not have any extensions or extra themes installed.

The use case that led me to this issue:

I'm developing an web application using VueJS framework, to be used as a special "plugin" for a SaaS platform called LeanIX.
The workflow for developing this "plugins" - or "Custom Reports" in LeanIX terminology - is as follows:

  1. Scaffold a new VueJS project using the create-lxr library via npm init lxr@latest command.
  2. Launching locally a development server using Vite with npm run dev.
    a. The local development server runs on https, plus Vite creates a WSS server (HMR - Hot-Module Reloading) for auto-triggering screen updates on every source code changes.
    b. Both servers (https and wss) listen on localhost:3000.
    c. On launch, a special url with embedded credentials is provided to the user, pointing to a LeanIX iframe that allows to preview this "Custom Report" - running locally - as if it were deployed to the platform.

I've prepared a simple project that can be used to reproduce the issue. You can find it here.
I've also taken a step ahead and deployed a simple demo app here containing an iframe with a user-configurable url. In this case, when pointing to the local development server (https://localhost:3000), the issue does not occur.

Flags: needinfo?(pauloramires)
Component: DOM: Service Workers → Networking: WebSockets

Thanks for the test case. I can reproduce this locally.

As far as I can tell, it seems the problem is at this line. I have no idea why we failed to get a document from windows context.
Eden, do you probably have an idea?

Component: Networking: WebSockets → DOM: Networking
Flags: needinfo?(echuang)

I've noticed this issue comming up again today.

Again, runned a regression with the following output:

Tested autoland build: 383986e2 (veredict: g)
Tested autoland build: d6e8528f (veredict: b)

Log output:
2022-02-17T19:38:57.374000: INFO : Narrowed integration regression window from [383986e2, 55b351f8] (3 builds) to [383986e2, d6e8528f] (2 builds) (~1 steps left)
2022-02-17T19:38:57.381000: DEBUG : Starting merge handling...
2022-02-17T19:38:57.381000: DEBUG : Using url: https://hg.mozilla.org/integration/autoland/json-pushes?changeset=d6e8528f0a936df88369c71f4e390e31a4d621a1&full=1
2022-02-17T19:38:57.381000: DEBUG : redo: attempt 1/3
2022-02-17T19:38:57.382000: DEBUG : redo: retry: calling _default_get with args: ('https://hg.mozilla.org/integration/autoland/json-pushes?changeset=d6e8528f0a936df88369c71f4e390e31a4d621a1&full=1',), kwargs: {}, attempt #1
2022-02-17T19:38:57.383000: DEBUG : urllib3.connectionpool: Resetting dropped connection: hg.mozilla.org
2022-02-17T19:38:58.681000: DEBUG : urllib3.connectionpool: https://hg.mozilla.org:443 "GET /integration/autoland/json-pushes?changeset=d6e8528f0a936df88369c71f4e390e31a4d621a1&full=1 HTTP/1.1" 200 None
2022-02-17T19:38:58.730000: DEBUG : Found commit message:
Bug 1732358 - Part 5: Add the fission rollout slug to the GRADUATION_SET, r=mythmon

Depends on D133008

Differential Revision: https://phabricator.services.mozilla.com/D133659

2022-02-17T19:38:58.730000: DEBUG : Did not find a branch, checking all integration branches
2022-02-17T19:38:58.745000: INFO : The bisection is done.
2022-02-17T19:38:58.749000: INFO : Stopped

Severity: -- → S3
Priority: -- → P2
Whiteboard: [necko-triaged]

(In reply to Kershaw Chang [:kershaw] from comment #10)

As far as I can tell, it seems the problem is at this line. I have no idea why we failed to get a document from windows context.
Eden, do you probably have an idea?

Moving ni? to :smaug.

Flags: needinfo?(echuang) → needinfo?(bugs)

Bug 1660968 changed the loop to go through each window context and checking if document is available. With Fission it is. (Aug 2: I think I meant 'isn't')
I assume in comment 9 FF wasn't restarted after changing fission pref.

Flags: needinfo?(bugs)
Regressed by: 1660968

Set release status flags based on info from the regressing bug 1660968

Clearing the flags to get this re-triaged. This looks pretty bad regression.

Severity: S3 → --
Priority: P2 → --
Whiteboard: [necko-triaged] → [necko-triaged][necko-priority-queue]

I've added [necko-priority-queue] to make this bug be fixed asap.

Severity: -- → S3
Priority: -- → P2

The bug has a release status flag that shows some version of Firefox is affected, thus it will be considered confirmed.

Status: UNCONFIRMED → NEW
Ever confirmed: true

Set release status flags based on info from the regressing bug 1660968

:kershaw is this something you plan on looking to fix and get an uplift in 103 or will it be for the 104 release cycle?

Flags: needinfo?(kershaw)

(In reply to Donal Meehan [:dmeehan] from comment #19)

:kershaw is this something you plan on looking to fix and get an uplift in 103 or will it be for the 104 release cycle?

I might have no time to fix this in this cycle, since I am going to take two weeks off.

Greg, could you find someone to work on this?
Thanks.

Flags: needinfo?(kershaw) → needinfo?(ghess)

Hi Reporter,

It looks like I am not able to reproduce this anymore.
Since I get an error Invalid LeanIX API token when trying to use your project to reproduce, I tried a different setup below. Please have a look and let me know if I did something wrong.

  1. Create a vite project as described in this page.
  2. Type npm run dev to launch the server.
  3. Open your demo app.
    I saw the following meesages in web console:
[vite] connecting... client.ts:16:8
[vite] connected. client.ts:53:14

Looks like the websocket connection works.

Could you try again with the latest Firefox at you side?
Thanks.

Flags: needinfo?(ghess) → needinfo?(pauloramires)

(In reply to Kershaw Chang [:kershaw] from comment #21)

Hi Reporter,

It looks like I am not able to reproduce this anymore.
Since I get an error Invalid LeanIX API token when trying to use your project to reproduce, I tried a different setup below. Please have a look and let me know if I did something wrong.

  1. Create a vite project as described in this page.
  2. Type npm run dev to launch the server.
  3. Open your demo app.
    I saw the following meesages in web console:
[vite] connecting... client.ts:16:8
[vite] connected. client.ts:53:14

Looks like the websocket connection works.

Could you try again with the latest Firefox at you side?
Thanks.

Hi Kershaw,
the api token was expired. Please use this one:
vVTqm7zprRWBHHmOkzQsCnDX4BgVj2sCk57Tm5vz

The issue is still occuring.

Regards,
Paulo

Flags: needinfo?(pauloramires)

Hi Paulo,

Can you please provide the command line arguments you're using to launch your app?

Flags: needinfo?(pauloramires)
Assignee: nobody → acreskey

Hi Andrew, sorry for the late reply.

  1. git clone git@github.com:psantos9/firefox-demo-issue-custom-report.git
  2. npm install
  3. npm run dev (you'll get a url when the dev server launches)
  4. open a new browser window and add a local ssl exception for the https://localhost:3000 address
  5. open another browser window and navigate to the url from 4.
Flags: needinfo?(pauloramires)

(In reply to pauloramires from comment #24)

Hi Andrew, sorry for the late reply.

  1. git clone git@github.com:psantos9/firefox-demo-issue-custom-report.git
  2. npm install
  3. npm run dev (you'll get a url when the dev server launches)
  4. open a new browser window and add a local ssl exception for the https://localhost:3000 address
  5. open another browser window and navigate to the url from 4.

Before step 3, add a lxr.json file to the project root folder with the following content:

{
"host": "app.leanix.net",
"apitoken": "kxWDUh9xHLgMJjfx6eWqOM4UAwgjfF8X7Gej3zkB"
}

(In reply to pauloramires from comment #25)

(In reply to pauloramires from comment #24)

Hi Andrew, sorry for the late reply.

  1. git clone git@github.com:psantos9/firefox-demo-issue-custom-report.git
  2. npm install
  3. npm run dev (you'll get a url when the dev server launches)
  4. open a new browser window and add a local ssl exception for the https://localhost:3000 address
  5. open another browser window and navigate to the url from 4.

Before step 3, add a lxr.json file to the project root folder with the following content:

{
"host": "app.leanix.net",
"apitoken": "kxWDUh9xHLgMJjfx6eWqOM4UAwgjfF8X7Gej3zkB"
}

Sorry, wrong token (that one is expired):
use this one for the apitoken: LfCtQOREH9LM47D6Y5SftRvK8Np6DQrRXBALCxZW

Thanks for the detailed steps, Paulo - I can now run the project.

When I first visit https://localhost:3000, I add the local ssl exception
(because localhost:3000 uses an invalid security certificate. The certificate is not trusted because it is self-signed. Error code: MOZILLA_PKIX_ERROR_SELF_SIGNED_CERT).

But when I load https://localhost:3000 in a new window, I'm seeing the "Hello Vue 3 + TypeScript + Vite" page load without errors.
From the console:

[vite] connecting... client.ts:22:8
[vite] connected. client.ts:52:14

This is nightly, 104.0a1, with fission enabled.

Can you think of anything I may have missed?

Hi Andrew,
after you run npm run dev you'll get a launch url in the console.
You should open that launch url instead of https://localhost:3000.

vite v2.7.10 dev server running at:

  > Local:    https://localhost:3000/
  > Network:  https://192.168.15.146:3000/

  ready in 2331ms.

🚀 Your development server is available here => https://eu-6.leanix.net/customReportDev/reporting/dev?url=https%3A%2F%2Flocalhost%3A3000#access_token=eyJraWQiOiI0MDJjODg3NTBjZmJhOGQzZTQ0NjE0YzQ5YjBlYzg3NiIsImFsZyI6IlJTMjU2In0.eyJzdWIiOiIxNTg2MjA5OS1mNmY5LTRkNTMtOGE2OS03ZTk2MTYyZmY1ZGYuLTEzNzgwODg2NzFAdGVjaG5pY2FsdXNlcnMubGVhbml4LmxvY2FsIiwicHJpbmNpcGFsIjp7ImlkIjoiMGEzYTFjY2MtOWQ2ZC00MjNlLWI3MTktZjA1YzVlMDc4ZDdlIiwidXNlcm5hbWUiOiIxNTg2MjA5OS1mNmY5LTRkNTMtOGE2OS03ZTk2MTYyZmY1ZGYuLTEzNzgwODg2NzFAdGVjaG5pY2FsdXNlcnMubGVhbml4LmxvY2FsIiwicm9sZSI6IkFDQ09VTlRVU0VSIiwic3RhdHVzIjoiQUNUSVZFIiwiYWNjb3VudCI6eyJpZCI6IjNhZmEyMTZhLWFlMzEtNGM5ZS1hNzJmLWE5NWNjMTg0MDEyZCIsIm5hbWUiOiJmYXplbmRhZG9zb2Z0d2FyZSJ9LCJwZXJtaXNzaW9uIjp7

I can reproduce the DOMException, thank you Paulo.

Uncaught DOMException: The operation is insecure. client.ts:28
    <anonymous> client.ts:28

Let me find a solution.

Just to confirm that with fission.autostart set to false and then restarting the browser, I am no longer able to reproduce the DOMException.

It looks like the problem occurs when trying to use the secure websockets from this sandbox'ed iframe:

<iframe _ngcontent-upv-c544="" sandbox="allow-scripts allow-downloads" src="https://localhost:3000?v=fb166ce&amp;reportId=dev&amp;bookmark=default"></iframe>

Specifically, we do in fact fail here, as Kershaw pointed out in Comment 10.
In this case I see these fields on the windowContext: IsTop(): 1, SameOriginWithTop(): 0, IsSecureContext(): 1

But I will need help from someone with a better understanding of navigating the window contexts.
Maybe we have some common code for determining the principal?

Jens, would you be able to find someone who can help determine the correct loading principal?
As per comment 13 this was regressed in Bug 1660968.
Currently the scenario where this fails is very specific, websockets from an iframe, connecting to localhost.

Flags: needinfo?(jstutte)

(In reply to Andrew Creskey [:acreskey] [he/him] from comment #32)

Jens, would you be able to find someone who can help determine the correct loading principal?
As per comment 13 this was regressed in Bug 1660968.
Currently the scenario where this fails is very specific, websockets from an iframe, connecting to localhost.

:smaug, do you have enough context to answer this from the top of your head?

Flags: needinfo?(jstutte) → needinfo?(smaug)

Christoph, you may also be able to help here.
We have a scenario where we are traversing windowContexts trying to determine the Principal, but we ultimately fail.
https://searchfox.org/mozilla-central/rev/3cb31675aeffd10f1f6ae7c40e24b254da7798e5/dom/websocket/WebSocket.cpp#2785
A bit more context in Comment 13.
Would you be able to find someone who can maybe point to a common function that could be used here?
I'm a bit concerned about implementing this logic independently within WebSockets.

Flags: needinfo?(ckerschb)

Hey Andrew, querying the correct principal for webSockets can be a cumbersome task and currently I can't think of a better function to use than the one that is specific within WebSocket.cpp.

Reading through the comments however it seems we are dealing with a sandboxed iframe, which uses a NullPrincipal as the security context. I am not 100% sure but I could imagine that updating this code:

if (principal && !principal->GetIsNullPrincipal()) {
  break;
}

and use the PrecursorPrincipal instead. The PrecursorPrincipal should be the principal/security context that created the sandboxed iframe. So something like this should work:

if (principal->GetIsNullPrincipal()) {
  principal = principal->GetPrecursorPrincipal();
  principal.forget(aPrincipal);
   return NS_OK;
 }

Please verify with :smaug and/or :nika, but I think that should work.

Flags: needinfo?(ckerschb)

Use the PrecursorPrincipal of NullPrincipals to correctly handled sandboxed iframes.

Thanks, Christoph. I've put that up for review and discussion.

Some thoughts from Nika on the patch that I'm copying here to further the discussion:

nika:
In general the precursor principal isn't a concept which actually occurs in the standard, it only exists as an internal implementation detail. I'm uncomfortable actually using it as a loading principal for making loading decisions here.

It seems somewhat normal that a sandboxed frame would be unable to access a websocket to a given page given that it is sandboxed and forced to have an opaque principal. Changing this seems odd to me.

I haven't read over the bug yet, and won't until next week, so I might be missing some earlier discussion on this. Apologies if this has already been discussed.

Hi pauloramires -- can you please provide a new API token, this one has expired with this error: "💥 Invalid LeanIX API token"

Flags: needinfo?(pauloramires)

Hi Andrew, sorry for the late reply.
crxv6xLnG3UvpXjs3gEN9GfM93a5rCVAhvzQP2uh

Flags: needinfo?(pauloramires)

Thanks Paulo. I've verified that patch works in your scenario.
We're just making sure that we handle this situation as well as others correctly.

Pushed by acreskey@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/7f84017ef241 Getting Uncaught DOMException: The operation is insecure while opening Websocket using 'wss' protocol r=smaug
Status: NEW → RESOLVED
Closed: 2 years ago
Resolution: --- → FIXED
Target Milestone: --- → 106 Branch
Regressions: 1793868
Flags: needinfo?(smaug)
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: