Closed Bug 1444895 Opened 7 years ago Closed 5 years ago

Treeherder should renew auth0 credentials if they expired to prevent frequent logout (not shown in UI until page reloaded)

Categories

(Tree Management :: Treeherder: Frontend, defect, P1)

defect

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: marco, Assigned: sclements)

References

(Blocks 1 open bug)

Details

(Whiteboard: [domsecurity-backlog1])

Attachments

(3 files)

I'm getting logged off within one hour from last usage. I'm using containers and I set up taskcluster to always load in the "Work" container.
Have you set up both tools.taskcluster.net and login.taskcluster.net to be in that container?
Component: Authentication → Login
(In reply to Dustin J. Mitchell [:dustin] pronoun: he from comment #1) > Have you set up both tools.taskcluster.net and login.taskcluster.net to be > in that container? No, maybe that was it. I'll report back in a few hours.
No, I was logged out again.
I'm betting this is a containers issue. We had two other reports of containers issues this morning via irc. Likely something landed in nightly that broke the login process.
(In reply to Dustin J. Mitchell [:dustin] pronoun: he from comment #4) > I'm betting this is a containers issue. We had two other reports of > containers issues this morning via irc. Likely something landed in nightly > that broke the login process. I've been seeing this for a while though, at least a few weeks if not more.
What do you see in the console when you get logged off?
Could not renew login: Object { error: "login_required", errorDescription: "Login required", state: "ZHt7nYgSSSssAiDklDjjAoz_Q57IjNSg" }
OK, that's auth0 telling us that you're no longer logged into auth0. If you go back to sso.mozilla.com after that point, you will likely find yourself logged out. Although it may automatically log you back in -- I think that was a recent addition by the IAM team. Maybe :kang can help figure out what's going on here? Does the automatic re-login work in hidden iframes?
Flags: needinfo?(gdestuynder)
The auto-login function triggers when the auth0 login page is loaded - i.e. the RP believes the user session should be rechecked. Auto-login simply forces the `prompt=none` parameter which is a "silent" login even when the RP does not support it. If the RP supports `prompt=none` (I believe taskcluster does), you don't even see the auto-login as the RP directly performs the check (which is a lot faster/nothing displays) This attempts to log you in automatically as the name says, and will only succeed if you have a valid auth0 session cookie. auth0 session cookies are valid up to 30 days at the moment, however, they become invalid if you haven't logged in to any RP for ~3 consecutive days (so say you logged in in a container tab, went away for 3 days/haven't touched it, come back on the 4th day - if the RP logged you out, auth0 won't auto-login). Note: from the auth0 auth logs it looks like that the auth0 session is gone from your user-agent and you have to reauthenticate several times per hour, thus, I suspect :dustin might be onto something regarding Nightly container tab having an issue. If you want to check, open the Firefox dev tools and look for the cookie called "auth0" on auth.mozilla.auth0.com and see if it exists and is still valid, and is sent correctly to auth0.
Flags: needinfo?(gdestuynder)
I'm seeing the same problem (being logged out too soon) on another website too (alitalia.it) that I use in another container. Sounds like the issue is likely related to containers.
Component: Login → DOM: Security
Product: Taskcluster → Core
I wonder if there is some cases where cookies aren't sent when in a container :baku perhaps redirect or another edge case where the userContextId hasn't been passed or similar.
Flags: needinfo?(amarchesini)
Priority: -- → P3
Summary: Automatic logoff from Taskcluster is too frequent → Automatic logoff from Taskcluster is too frequent (when using Containers)
Whiteboard: [domsecurity-backlog1]

Is anyone still seeing this with Taskcluster ? It seems to have stopped for me at some point, currently using 66.0 betas.

let's say "no" and reopen if wrong

Status: NEW → RESOLVED
Closed: 6 years ago
Flags: needinfo?(amarchesini)
Resolution: --- → FIXED

I'm still seeing this unfortunately :(

Status: RESOLVED → REOPENED
Resolution: FIXED → ---

Prior to the taskcluster switchover on November 9, 2019 I used to have treeherder log me out every Monday, like clockwork.

Post-deployment of the new taskcluster, I am seeing logoffs often in matter of days, sometimes even hours.

I am using containers for all of my treeherder tabs, on Firefox 72.0a1, maosx1014 and macosx1015.

Are treeherder, https://firefox-ci-tc.services.mozilla.com, and auth0 all in the same container?

Yes - my typical environment has 3 different containers running, each loaded with treeherder, taskcluster and auth0.

Say I take the personal container:

  • open treeherder
  • log in
  • prompted for auth0 if required
  • approve via duo

Hm, if everything is in the same container, then this should work "normally", like a regular browser session, right? The cases where we've seen issues with containers are because when a token renewal occurred, the service being called to check the renewal wasn't active in that container.

So I'm guessing there's some service that's pinned to a different container, or that something in the process of opening new tabs for the sign-in process "loses track" of what container it's operating in.

Yes, the container should work like normal browser sessions, just separated out so from the perspective of treeherder there should be 3 separate sessions for user egao when I use 3 containers. I am not familiar with the technical implementation of how containers work, so I cannot comment on that portion.

Note, I only use containers to separate out visually the work I am doing, so I'm not relying on some intrinsic behavior of the container itself for my work. I could replace my workflow with a tab group and it would do what I would like it to.

(In reply to Dustin J. Mitchell [:dustin] (he/him) from comment #17)

Are treeherder, https://firefox-ci-tc.services.mozilla.com, and auth0 all in the same container?

Yes for me too (and I only have one container where they are opened, and that's the only container I use other than the default). Maybe there is some intermediary domain which is not pinned to the container and so is opened in the default container.

Are there tools in Firefox that help debug container issues like this? This must be fairly common..

(In reply to Dustin J. Mitchell [:dustin] (he/him) from comment #22)

Are there tools in Firefox that help debug container issues like this? This must be fairly common..

Andrea, maybe you know? IIRC you worked on containers.

Flags: needinfo?(amarchesini)

jmaher also noticed the issue this week as did Andreea Pavel. At least the latter doesn't use containers.

Andrea, maybe you know? IIRC you worked on containers.

Redirect to jkt. I use gdb/rr...

Flags: needinfo?(amarchesini) → needinfo?(jkt)

OK, sounds like containers are off the hook (or we're dealing with two issues..).

Can someone provide a reproduction recipe?

Flags: needinfo?(jkt)

As an update, this automatic logoff also occurs for Bugzilla and Treeherder, so is not an issue limited to Taskcluster.

There is no reliable step to reproduce, since it seems to occur out of the blue.

  1. log in and auth0 at all locations as required (treeherder, taskcluster, bugzilla, etc)
  2. use container for the day
  3. put computer to sleep
  4. wake up next morning
  5. refresh tabs for treeherder, taskcluster, etc.

With some probability, user will be logged out of some or all of the services.

I am not experiencing a logout from Bugzilla. This morning, I logged into the firefox-ci cluster to attempt to retrigger a hook and later into Treeherder to request the task. At least 7h later, TH showed me as logged out. Not interacting with the tools into which one logged in or using the permissions associated with them might be relevant. (Had seen a logoff on another day ~6h after login).

Attached image image.png (deleted) —

Each sheriff experienced 3 logouts since the beginning of our shift, and as far we can tell, the logouts were made after about every 2h. As you can see in the screenshot attached, Treeherder showed first the bug as associated with the failure and immediately the error after that, the failure remaining unclassified.
Strangely, Treeherder page showed us as being still logged in, the name was visible in the top right hand corner until the refresh of the page after which we we had to log back in from the Login/Register button.

Flags: needinfo?(dustin)

Comment 28 was about a logout from Treeherder. I don't think Treeherder shows whether a user is "logged in" to Taskcluster or not. Overall, I think there are still a number of reports of what might be different bugs, all without enough detail to debug. If we're going to get to the bottom of this, I think we'll need a reproduction recipe, ideally starting with an Auth0 login in a fresh session. Probably starting a browser with a fresh profile would help with the "not interacting with the tools" that aryx mentioned, too.

Flags: needinfo?(dustin)
  1. Opened Treeherder in a new profile
  2. Logged in
  3. Let it idle for 7h.
  4. Tab still showed me as logged.
  5. Reloaded the tab.
  6. Treeherder showed me as not logged in.

The current and the previous reported that each got backed out "a couple of times" but not during their previous shift (2.5 days ago).

Related to the Django upgrade?

Component: DOM: Security → Treeherder: Frontend
Flags: needinfo?(armenzg)
Product: Core → Tree Management
Version: unspecified → ---

This sounds like an auth0/TH backend issue that doesn't anything to do with Taskcluster credentials. Those credentials are separately retrieved and are independent from each other as of the Nov 9th changeover. When you log in to Treeherder, your moz credentials are being retrieved and we're doing some session management via our django backend (classifying failures involves savings changes to our database but wouldn't have anything to do with taskcluster specifically, which is what this original bug was for).

I don't think the existing TH/auth0/session management logic was changed too much for the Nov 9th changeover, but I'll take a look. From bug 1605651 and Edwin's comments, this seems like it's been happening for long enough that I'm not sure the recent Django upgrade would have anything to do with it. I'll look into this next week.

Flags: needinfo?(armenzg)

Devtools' Storage tab lists

  • 103 com.auth0.auth.... cookies (previously discussed in bug 1507454 also see below)
  • a sessionid cookie (expires 2h after the login)
  • a csrftoken one.

If the sessionid expires, is there supposed to be an automatic renewal or has the lifetime been shortened?

Regarding the many auth cookies:
Sheriff Narcis had ~15 auth cookies (oldest from Friday). My oldest is from Saturday. Does every treeherder tab initiate its own credential renewal? Sometimes cookies have expiration dates in a narrow time window, e.g. yesterday ~13:35 UTC, 19 cookies expired in 5 minutes. That Firefox still has the expired cookies is bug 691973.
That the issue can be reproduced in a new profile with one auth cookie shows this is unrelated.

Summary: Automatic logoff from Taskcluster is too frequent (when using Containers) → Treeherder should renew auth0 credentials if they expired to prevent frequent logout (not shown in UI until page reloaded)

Confirming that the logout happens after 2h when the sessionid cookie expires.

(FTR, the com.auth0 cookies count increases with every manual reload of a treeherder tab.)

Assignee: nobody → sclements
Status: REOPENED → ASSIGNED
Priority: P3 → P1

These are all good questions Sebastian, thanks for the details. I'm looking into this today.

After some investigating there appears to be two issues:

  1. We are receiving an access code from auth0 during login that includes an expires_in=7200 param (2 hours) instead of 24 hours. This might be an auth0 issue that our iam team/infosec might be able to help with. Bug 1611030 was filed and has more details.
  2. The auto/silent renewing might not be working. After leaving myself logged in and looking at the console by chance, I noticed a Could not renew login:, { error: "timeout", errorDescription: "Timeout during authentication renew"} message. I'll continue looking into this part of the code tomorrow. At the very least, we should be showing the user is logged out when that error is caught.

Sheriffs, I'll be curious to see if you encounter the same error message next time you try to classify and get a message indicating you're not logged in (just check the console before refreshing).

Yes, the same error message can be found in the console here.

Update:

With the recent Django 3 upgrade, the default value of x-frame-options was changed from "sameorigin" to "deny" so this was causing the frequent timeouts/logging out issues (the auth0 silent renewal for the access token is set to run every 15 minutes - in an iframe - and resets if a new tab is opened I think).

For why the access token has an expiration of every 2 hours, It appears that we're using the implicit flow login for auth0 - since we're requesting id_token token in the UI instead of code, which returns an access token with an expiration of 2 hours instead of 24 hours.

I'd have to look more into what a change to the authorization flow for 24 hour access tokens would entail. So for now, the patch sets the x-frame-options back to sameorigin and fixes the logout mechanism so the UI updates when users are logged out due to a silent renewal failure.

Just made a minor tweak to the log out logic ^

I have other priorities to work on so closing out this bug. If anyone has other issues with the login/credentials - or would like us to explore a switch to the 24 hour access token - please file a new bug.

Status: ASSIGNED → RESOLVED
Closed: 6 years ago5 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: