Closed Bug 760910 Opened 12 years ago Closed 12 years ago

Handle 401 responses correctly

Categories

(Web Apps Graveyard :: AppsInTheCloud, defect)

defect
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: anant, Assigned: nick)

References

Details

(Whiteboard: [blocking-aitc+][qa-])

Attachments

(1 file, 3 obsolete files)

We're refreshing tokens every 4 minutes when there are pending requests. We shouldn't refresh tokens unless they fail, and also not assume we know how long they're good for (this should be server-controlled). Don't fail on a 401 the first time, update token and try again (if we're still failing, die and log that the server is boned, and maybe even report it via some magic API!)
Whiteboard: [blocking-aitc]
Whiteboard: [blocking-aitc] → [blocking-aitc+]
Also worth noting: when you refresh your token you may be assigned a different endpoint URL than you had previously. If the endpoint URL is persisted anywhere then it may need to be updated.
Ryan, can we add a field to the token response that says how long it is valid for? If you agree, let's file a bug for it and block this one on that?
We'll add a 'duration' field in the mapping returned by the GET see http://docs.services.mozilla.com/token/apis.html
Assignee: nobody → alexis
While I can see merit to having a timestamp (I know it's gone in and out of the spec several times), I don't think this should be a blocking issue. If the client gets a 401, it should head back to the tokenserver, and if it gets another 401, sleep for some period of time (1/2 hour?). There may be scenarios where an unexpired token gets rejected - most notably if we have to pull a key. Relying on the timestamp to tell us the token remains legitimate seems dangerous.
Also, looking at the Duration field in the revised docs, it seems like that should return a Date rather than an integer. If I was building a client for the api, first thing I'd need to do would be to figure out a date from that, so that I could know without having to preserve context.
The problem with returning a date is that it is relative to the server and not to the client, and thus can be innacurate for the latter. Having a way to know what's the delay between two token generation calls might be of use (for instance to setup a callback on the client, which will update the token to a new one automatically)
(In reply to Toby Elliott [:telliott] from comment #5) > There may be scenarios where an unexpired token gets rejected - most notably > if we have to pull a key. Relying on the timestamp to tell us the token > remains legitimate seems dangerous. (for clarification, there is value to an expiry - the client may *know* they need to get a new one and save a 401 round-trip, but that's a separate issue from the one described here)
(In reply to Alexis Metaireau (:alexis) from comment #7) > The problem with returning a date is that it is relative to the server and > not to the client, and thus can be innacurate for the latter. > The date presumably has some sort of timezone affixed to it that can be parsed and converted into whatever relevant time we want. That's worked for HTTP for some time. If that's not sufficient, then unix timestamp would do. Also, if the date is inaccurate, that shouldn't be a problem. Two scenarios: 1) The client thinks the token has expired even though it has not, goes and gets a new one. All good (save a degenerate case where the client immediately thinks the token has expired. That seems extreme) 2) The client thinks the token has not expired where it has, it tries to use it, gets a 401 and goes and gets the new token. All good.
@Toby: The client just needs a duration so it optimizes the time between two token server calls -- to avoid extra 401s. Why do you want us to send the client a date that's specific to the server ? How will the client know when it should call again the TS server ? I am curious to see how you are going to implement this. With a duration, in pseudo-code it's: def get_token_from_server(): token = server.get('/token') callback(get_token_from_server, when=now+token['duration']) We don't bother with timezones, or complicated data parsing. We just know that the token gets deprecated around now + duration
You're correct if all we plan to use it for is the pseudocode, in which case, all good. I'm thinking about persistence across restarts.
Whiteboard: [blocking-aitc+] → [blocking-aitc+][qa+]
(In reply to Toby Elliott [:telliott] from comment #5) > While I can see merit to having a timestamp (I know it's gone in and out of > the spec several times), I don't think this should be a blocking issue. If > the client gets a 401, it should head back to the tokenserver, and if it > gets another 401, sleep for some period of time (1/2 hour?). > > There may be scenarios where an unexpired token gets rejected - most notably > if we have to pull a key. Relying on the timestamp to tell us the token > remains legitimate seems dangerous. The intention behind blocking on this was that it seemed beneficial to have token expiry time being controlled by the server. Ops could, for instance, tune the expiry time to manage load.
(In reply to Tarek Ziadé (:tarek) from comment #10) > def get_token_from_server(): > token = server.get('/token') > callback(get_token_from_server, when=now+token['duration']) This is almost exactly what we were planning to implement, so it'd be better to have a duration as opposed to a date. We are currently not persisting the token across restarts.
(In reply to Alexis Metaireau (:alexis) from comment #4) > Should be fixed in the last master (r+ by tarek), see: Is this available on the dev or staging server now? I'd like to test a client patch for this.
Anant, this is okay. We have 0.9 in staging and in production now. Please mark it as resolved if everything works for you.
Status: NEW → ASSIGNED
Not closing this quite yet, since we also need to finish up the client side portion of this. Thanks guys!
Assignee: alexis → nobody
Status: ASSIGNED → NEW
I'd hold off on flagging qa+ until the patch is heading to land (cause that's the time when the parameter matters).
Whiteboard: [blocking-aitc+][qa+] → [blocking-aitc+]
Assignee: nobody → ndesaulniers
Status: NEW → ASSIGNED
OS: Linux → All
Hardware: x86 → All
Depends on: 760909
Attached patch rev1 (obsolete) (deleted) — Splinter Review
Attachment #640378 - Flags: review?(gps)
Comment on attachment 640378 [details] [diff] [review] rev1 Review of attachment 640378 [details] [diff] [review]: ----------------------------------------------------------------- Nit: Set proper summary line for commit message. Bug 760910 - Handle 401 responses in AITC client code ::: services/aitc/modules/client.js @@ +369,4 @@ > // Set values from X-Backoff and Retry-After headers, if present > _setBackoff: function _setBackoff(req) { > let backoff = 0; > + let successfulStatusCodes = [201, 204, 401]; 401 isn't really a "successful status code." Perhaps rename to indicate the intended purpose? statusCodesWithoutBackoff? @@ +405,4 @@ > // Fuzz backoff time so all client don't retry at the same time > backoff = Math.floor(Math.random() * backoff) + backoff; > this._state.set("backoff", "" + (time + backoff)); > + this._log.debug("CLIENT SETTING BACKOFF TO: " + backoff); No need to put this in caps. You may also want to crank up the log level to info, as this is more important than debug, IMO. ::: services/aitc/modules/manager.js @@ +30,4 @@ > * methods on this object until the callback has been invoked, doing so will > * lead to undefined behaviour. > */ > +function AitcManager(cb, premadeClient, premadeToken) { Add docs for premadeToken's purpose. You may also want to use default values to explicitly set this to null instead of undefined. @@ +34,5 @@ > this._client = null; > this._getTimer = null; > this._putTimer = null; > > this._lastToken = 0; Consider renaming to _lastTokenTime. @@ +35,5 @@ > this._getTimer = null; > this._putTimer = null; > > this._lastToken = 0; > + this._tokenDuration = 240000; // 4 minutes Why did you nuke the constant initial value? Magic numbers should almost always be constified. @@ +226,5 @@ > _validateToken: function _validateToken(func) { > + let timeSinceLastToken = Date.now() - this._lastToken; > + this._log.debug('TIME SINCE LAST TOKEN: ' + timeSinceLastToken); > + this._log.debug('TOKEN DURATION: ' + this._tokenDuration); > + this._log.debug('TOKEN FLAG: ' + this._invalidTokenFlag); Nit: Double quotes. Nit: No need for caps. Nit: These can all be in one log line. @@ +244,4 @@ > this._refreshToken(function(err, done) { > if (!done) { > this._log.warn("_checkServer could not refresh token, aborting"); > return; So if there is an error, func is never invoked. This seems wrong and likely indicates a bug and poor test coverage. Either fix it with the next patch (perhaps ask Anant first) or file a follow-up bug blocking release. @@ +245,5 @@ > if (!done) { > this._log.warn("_checkServer could not refresh token, aborting"); > return; > } > + self._log.debug("IN REFRESH TOKEN CALLBACK L243 WITH NEW TOKEN"); "Got refreshed token." @@ +266,4 @@ > return; > } > > + this._log.debug('IN CHECKSERVER'); Say what you are actually doing rather than dropping a debug label. @@ +278,5 @@ > this._client.getApps(function gotApps(err, apps) { > if (err) { > // Error was logged in client. > + if (err.authfailure) { > + self._log.debug('IN GETAPPS'); "Validating token due to client auth failure." @@ +279,5 @@ > if (err) { > // Error was logged in client. > + if (err.authfailure) { > + self._log.debug('IN GETAPPS'); > + self._validateToken(function (err) { Nit: always name functions. @@ +327,4 @@ > return; > } > > + this._log.debug('IN PROCESSQUEUE'); Nit: Better log message. @@ +342,4 @@ > // Send to end of queue if unsuccessful or err.removeFromQueue is false. > if (err && !err.removeFromQueue) { > self._log.info("PUT failed, re-adding to queue"); > + self._log.debug("ABOUT TO CHECK FOR AUTHFAILURE"); Is this necessary? @@ +344,5 @@ > self._log.info("PUT failed, re-adding to queue"); > + self._log.debug("ABOUT TO CHECK FOR AUTHFAILURE"); > + // Error was logged in client. > + if (err.authfailure) { > + self._log.debug(err.authfailure); Remove @@ +345,5 @@ > + self._log.debug("ABOUT TO CHECK FOR AUTHFAILURE"); > + // Error was logged in client. > + if (err.authfailure) { > + self._log.debug(err.authfailure); > + self._log.debug("IN PUT APPS, AUTHFAILURE OCCURRED"); "Auth failure when ..." @@ +352,5 @@ > + if (!err) { > + self._log.debug("SHOULD'VE GOTTEN A NEW TOKEN"); > + //self._log.debug("RESCHEDULING PUT"); > + //_reschedule(); > + //return; Eliminate dead code if it is really dead. @@ +501,5 @@ > + this._tokenDuration = parseInt(this._premadeToken.duration); > + this._log.debug("SETTING TOKEN DURATION TO: " + this._tokenDuration); > + this._lastToken = Date.now(); > + this._invalidTokenFlag = false; > + this._log.debug("UPDATED CLIENT TOKEN WITH PREMADE TOKEN"); Consolidate log messages. @@ +532,4 @@ > _gotToken: function _gotToken(err, tok, cb) { > if (!err) { > this._log.info("Got token from server: " + JSON.stringify(tok)); > + this._tokenDuration = parseInt(tok.duration); parseInt takes an optional radix argument. Always define it if you know what the integer representation is (you do here). ::: services/aitc/tests/unit/test_aitc_manager.js @@ +157,5 @@ > + mockRequestCount++; > + _('!!! MOCK REQUEST !!!'); > + _(manager._lastToken); > + _("Mock Request Count " + mockRequestCount); > + _('!!!!!!!!!!!!!!!!!!!!'); This seems excessive. ::: services/common/tokenserverclient.js @@ +114,4 @@ > * key (string) HTTP MAC shared symmetric key. > * endpoint (string) URL where service can be connected to. > * uid (string) user ID for requested service. > + * duration (string) the validity duration of the issued token. When did the token server change? We should split out this change into a separate bug and include unit test updates to reflect the server's new behavior. I filed bug 772261.
Attachment #640378 - Flags: review?(gps)
crap, forgot to qref the removal of my debug code. Now my noob-ness will forever be immortalized on bugzilla... patch rev2 inbound
Attached patch rev2 (obsolete) (deleted) — Splinter Review
how embarrassing!
Attachment #640378 - Attachment is obsolete: true
Attachment #640406 - Flags: review?(gps)
Attachment #640406 - Flags: review?(gps)
Sorry, I also need to make changes that Greg recommended.
(In reply to Gregory Szorc [:gps] from comment #19) > When did the token server change? We should split out this change into a > separate bug and include unit test updates to reflect the server's new > behavior. > > I filed bug 772261. This was changed on the server (stage) about 3 weeks ago, see comment #4.
Attached patch rev3 (obsolete) (deleted) — Splinter Review
Attachment #640406 - Attachment is obsolete: true
Attachment #640699 - Flags: review?(gps)
Comment on attachment 640699 [details] [diff] [review] rev3 Review of attachment 640699 [details] [diff] [review]: ----------------------------------------------------------------- What's the relationship between this and bug 760909? I would prefer to review them serially. I /think/ bug 760909 comes before this one?
Yes, I reuse test code from my previous patch in bug 760909.
Attachment #640699 - Flags: review?(gps)
Attachment #640699 - Attachment is obsolete: true
Attachment #642042 - Flags: review?(gps)
Comment on attachment 642042 [details] [diff] [review] AITC Client Handle 401 Responses Review of attachment 642042 [details] [diff] [review]: ----------------------------------------------------------------- Overall I think this is good. There are some places where repetitive code could get consolidated. But, I think that could be the realm of a follow-up bug. ::: services/aitc/modules/client.js @@ +376,4 @@ > // Set values from X-Backoff and Retry-After headers, if present. > _setBackoff: function _setBackoff(req) { > let backoff = 0; > + let statusCodesWithoutBackoff = [200, 201, 204, 401]; Need 304. Bitrotted. ::: services/aitc/modules/manager.js @@ +39,5 @@ > this._putTimer = null; > > + this._lastTokenTime = 0; > + this._tokenDuration = INITIAL_TOKEN_DURATION; > + this._premadeToken = premadeToken || null; Use default argument values. @@ +339,5 @@ > + self._log.error(err); > + } else { > + self._putApps(); > + } > + }); I've seen this pattern before. You think it's worth uplifting into a generic function? function _onValidateAndRun(fn, err) { if (err) { this._log.error(error); } fn.call(this); } Usage: this._validateToken(this._onValidateAndRun.bind(this, this._getApps)); ::: services/aitc/tests/unit/test_aitc_manager.js @@ +100,5 @@ > + manager._lastTokenTime = Date.now(); > + let mockRequestCount = 0; > + let clientFirstToken = null; > + > + server.onRequest = function mockstatus () { Nit: no space before ( @@ +115,5 @@ > + case 2: > + // Check that the client obtained a different token. > + do_check_neq(client.token.id, clientFirstToken.id); > + do_check_neq(client.token.key, clientFirstToken.key); > + server.stop(run_next_test); This will kill the server as it is serving the request. Does the test shut down OK? If so, great. That being said, it would be preferred to wait for the response before moving on to the next test.
Attachment #642042 - Flags: review?(gps) → review+
Whiteboard: [blocking-aitc+] → [blocking-aitc+], [fixed in services]
Is this possible to verify from an end-user perspective?
Whiteboard: [blocking-aitc+], [fixed in services] → [blocking-aitc+], [fixed in services], [qa?]
Nope.
Whiteboard: [blocking-aitc+], [fixed in services], [qa?] → [blocking-aitc+], [fixed in services], [qa-]
Status: ASSIGNED → RESOLVED
Closed: 12 years ago
Resolution: --- → FIXED
Whiteboard: [blocking-aitc+], [fixed in services], [qa-] → [blocking-aitc+][qa-]
Product: Web Apps → Web Apps Graveyard
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: