Closed Bug 760909 Opened 13 years ago Closed 12 years ago

Implement client driven backoff

Categories

(Web Apps Graveyard :: AppsInTheCloud, defect)

defect
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: anant, Assigned: nick)

References

Details

(Whiteboard: [blocking-aitc+][qa-])

Attachments

(2 files, 4 obsolete files)

AITC client should back off gradually if there are server errors, even without explicit headers saying so.
Whiteboard: [blocking-aitc]
Whiteboard: [blocking-aitc] → [blocking-aitc+]
Assignee: nobody → ndesaulniers
Attached patch patch (obsolete) (deleted) — Splinter Review
Attachment #637556 - Flags: review?(gps)
Depends on: 750948
Comment on attachment 637556 [details] [diff] [review] patch Review of attachment 637556 [details] [diff] [review]: ----------------------------------------------------------------- Generally looks good. Mostly style nits. Since this is your first patch AFAICT, I'm going to ask for a fresh patch before I give out r+. ::: services/aitc/modules/client.js @@ +366,4 @@ > // Set values from X-Backoff and Retry-After headers, if present > _setBackoff: function _setBackoff(req) { > let backoff = 0; > + let status_code_whitelist = [201, 204]; Nit: use camelCase for variable names. Also, how about successfulStatusCodes instead? Future knowledge: JS sets will soon be iterable. Which means you will soon be able to do: let successfulStatusCodes = new Set([201, 204]); if (req.response.status in successfulStatusCodes) @@ +382,5 @@ > + backoff = this._state.get("backoff.initial", 30); > + if (this._backoff_multiplier * backoff < this._state.get("backoff.max", > + 3600)) { > + backoff *= this._backoff_multiplier; > + this._backoff_multiplier = this._backoff_multiplier << 1; Couple of style nits here. First, try to keep the comparison in the |if| as simple as possible. i.e. if (currentBackoff < maxBackoff). If you need an additional variable assignment to make it easier to read, so be it. Second, as much as I personally like bitwise operators, please *= 2 instead. @@ +393,1 @@ > } I'd definitely add a comment block somewhere documenting your backoff strategy. I can tell what it is by reading the code. But, this stuff is important and the comments help reinforce what is coded. ::: services/aitc/tests/unit/test_aitc_client.js @@ +114,4 @@ > > do_check_eq(first.origin, app.origin); > > + PREFS.set("backoff", "0"); If you find yourself writing test shutdown logic over and over, create a function: function advance(server=null) { PREFS.set("backoff", "0"); if (server) { server.stop(run_next_test); return; } run_next_test(); } @@ +180,5 @@ > + do_check_eq(client._backoff_multiplier, 4); > + }); > + client._putApp(client._makeRemoteApp(app), function(error, status) { > + do_check_eq(client._backoff_multiplier, 8); > + }); I would generally say that if you are calling into protected APIs you are testing wrong. You are testing the implementation details are some way: good tests test behavior. In other words, can you rewrite this to check wall times and the behavior is doing the right thing? If this means the test takes a few seconds to execute, so be it. Backoff is important. We need to be sure it works right. ::: services/common/tests/unit/aitcserver.js @@ +215,5 @@ > + * Returns a specific status code for testing. > + */ > + _respondWithMockStatus: function _respondWithMockStatus(request, response) { > + response.setStatusLine(request.httpVersion, this.mockStatusCode, > + "Method Not Allowed"); Can you respond with the proper HTTP status message for each code? This probably means the caller explicitly defines it since I'm not sure if you can get the strings from anywhere conveniently (sadly). @@ +223,5 @@ > * HTTP handler for requests to /1.0/ which don't have a specific user > * registered. > */ > _generalHandler: function _generalHandler(request, response) { > + if (this.mockStatusCode !== null) { if (this.mockStatusCode) If someone is dumb enough to assign 0, garbage in garbage out applies. @@ +226,5 @@ > _generalHandler: function _generalHandler(request, response) { > + if (this.mockStatusCode !== null) { > + this._respondWithMockStatus(request, response); > + } else { > + let path = request.path; If your function body just has two branches like this, I prefer the style of early return. It saves from having to indent a bunch of code.
Attachment #637556 - Flags: review?(gps)
All suggested changes are complete, except for the 'wall times' test. This is not something testable with just a client unit. It will have to be an integration test on the manager.
Status: NEW → ASSIGNED
Patch inbound, to run tests from services-central/: make -C objdir/services/aitc/tests/ && make -C objdir/services/aitc/tests/ xpcshell-tests && make -C objdir/services/common/tests/ xpcshell-tests
Attached patch patchrev2 (obsolete) (deleted) — Splinter Review
Updated patch to rev 2 as per :gps
Attachment #637556 - Attachment is obsolete: true
Attachment #638920 - Flags: review?(gps)
Attached patch patchrev3 (obsolete) (deleted) — Splinter Review
Forgot to qref some things into patchrev2.
Attachment #638920 - Attachment is obsolete: true
Attachment #638920 - Flags: review?(gps)
Attachment #639367 - Flags: review?
Attachment #639367 - Flags: review? → review?(gps)
Comment on attachment 639367 [details] [diff] [review] patchrev3 Review of attachment 639367 [details] [diff] [review]: ----------------------------------------------------------------- Almost there. ::: services/aitc/modules/client.js @@ +38,4 @@ > > this._state = state; > this._backoff = !!state.get("backoff", false); > + this._backoff_multiplier = 2; Nit: camelCase @@ +80,5 @@ > let self = this; > DOMApplicationRegistry.getManifestFor(app.origin, function gotManifest(m) { > + if (m) { > + app.name = m.name; > + } Why was this added? @@ +382,5 @@ > this._log.warn("X-Backoff header was seen: " + val); > + } else if (successfulStatusCodes.indexOf(req.response.status) === -1) { > + /* Bad status code > + Increase backoff multiplier exponentially unless that exceeds max > + backoff time */ Nit: Use // @@ +384,5 @@ > + /* Bad status code > + Increase backoff multiplier exponentially unless that exceeds max > + backoff time */ > + backoff = this._state.get("backoff.initial", 30000); > + let maxBackoff = this._state.get("backoff.max", 3600000); Extract magic numbers into constants. @@ +389,5 @@ > + let mulBackoff = this._backoff_multiplier * backoff; > + if (mulBackoff < maxBackoff) { > + backoff = mulBackoff; > + this._backoff_multiplier *= 2; > + } I think this would be better written as: backoff = Math.min(this._backoffMultiplier * backoff, maxBackoff); this._backoffMultiplier *= 2; ::: services/aitc/modules/manager.js @@ +57,5 @@ > > + // Used for testing. > + if (premadeClient) { > + self._client = premadeClient; > + return; So the callback doesn't get invoked if using the premade client? You should document the parameter in the docblock above the function definition. ::: services/aitc/tests/unit/test_aitc_manager.js @@ +43,5 @@ > + modifiedAt: Date.now(), > + receipts: [] > + }; > + > + app[remote ? 'manifestPath' : 'manifestURL'] = "/manifest.webapp"; Nit: double quotes for all string literals not containing double quotes. @@ +54,5 @@ > + > + let token = { > + endpoint: server.url + username, > + id: 'ID-HERE', > + key: 'KEY-HERE' Nit: double quotes ::: services/common/aitcserver.js @@ +137,5 @@ > + this.mockStatus = { > + code: null, > + method: null > + }; > + this.onmockrequest = null; Nit: onMockRequest Actually, I'd just make this onRequest and always run it if it is defined. Generic behavior FTW.
Attachment #639367 - Flags: review?(gps)
(In reply to Gregory Szorc [:gps] from comment #7) > @@ +80,5 @@ > > let self = this; > > DOMApplicationRegistry.getManifestFor(app.origin, function gotManifest(m) { > > + if (m) { > > + app.name = m.name; > > + } > > Why was this added? One of the unit tests tried to sync an app without first installing it, and it caused a JS exception in this function. The case should actually never occur (i.e. the manifest for an installed app missing), so this check ensures that we catch that and fail appropriately.
(In reply to Nick Desaulniers [:\n] from comment #8) > (In reply to Gregory Szorc [:gps] from comment #7) > > @@ +80,5 @@ > > > let self = this; > > > DOMApplicationRegistry.getManifestFor(app.origin, function gotManifest(m) { > > > + if (m) { > > > + app.name = m.name; > > > + } > > > > Why was this added? > > One of the unit tests tried to sync an app without first installing it, and > it caused a JS exception in this function. The case should actually never > occur (i.e. the manifest for an installed app missing), so this check > ensures that we catch that and fail appropriately. You shouldn't need to verify that DOMApplicationRegistry's API is behaving properly. If a test was doing something wrong, fix the test.
(In reply to Gregory Szorc [:gps] from comment #7) > @@ +389,5 @@ > > + let mulBackoff = this._backoff_multiplier * backoff; > > + if (mulBackoff < maxBackoff) { > > + backoff = mulBackoff; > > + this._backoff_multiplier *= 2; > > + } > > I think this would be better written as: > > backoff = Math.min(this._backoffMultiplier * backoff, maxBackoff); > this._backoffMultiplier *= 2; > While cleaner, this always increments the backoff counter, instead of only when the multiplied backoff is less than the max backoff. This will cause problems with decrementing the backoff multiplier after a series of successful requests. This would make the client have to have the same number of successful requests as bad requests to make the multiplier reset. The max backoff check prevents the multiplier from getting too high, and allows the backoff to 'cool off' faster.
(In reply to Gregory Szorc [:gps] from comment #7) > ::: services/aitc/modules/manager.js > @@ +57,5 @@ > > > > + // Used for testing. > > + if (premadeClient) { > > + self._client = premadeClient; > > + return; > > So the callback doesn't get invoked if using the premade client? > > You should document the parameter in the docblock above the function > definition. > This conditional is only used for testing. Adding cb(true); within the conditional doesn't affect the code in testing or production either way. Maybe for future devs it might help so I'll add it. I added the documentation in my next patch for bug 760910, https://bugzilla.mozilla.org/attachment.cgi?id=640699&action=diff#a/services/aitc/modules/manager.js_sec2, should I move part of the comment in here, then modify my submitted patch for bug 760910, that way this patch has the premadeClient info with it?
(In reply to Gregory Szorc [:gps] from comment #9) > > One of the unit tests tried to sync an app without first installing it, and > > it caused a JS exception in this function. The case should actually never > > occur (i.e. the manifest for an installed app missing), so this check > > ensures that we catch that and fail appropriately. > > You shouldn't need to verify that DOMApplicationRegistry's API is behaving > properly. > > If a test was doing something wrong, fix the test. This is just a standard case of "verify the return value of a function you call", in this case it happens to be asynchronous. DOMApplicationRegistry is not incorrect if it doesn't return a manifest, it means the app isn't installed. This code is safer, the version without the check made it hard to debug.
(In reply to Nick Desaulniers [:\n] from comment #10) > (In reply to Gregory Szorc [:gps] from comment #7) > > @@ +389,5 @@ > > > + let mulBackoff = this._backoff_multiplier * backoff; > > > + if (mulBackoff < maxBackoff) { > > > + backoff = mulBackoff; > > > + this._backoff_multiplier *= 2; > > > + } > > > > I think this would be better written as: > > > > backoff = Math.min(this._backoffMultiplier * backoff, maxBackoff); > > this._backoffMultiplier *= 2; > > > > While cleaner, this always increments the backoff counter, instead of only > when the multiplied backoff is less than the max backoff. This will cause > problems with decrementing the backoff multiplier after a series of > successful requests. This would make the client have to have the same > number of successful requests as bad requests to make the multiplier reset. > The max backoff check prevents the multiplier from getting too high, and > allows the backoff to 'cool off' faster. Good catch. So what about: this._backoffMultiplier = Math.min(this._backoffMultiplier * 2, MAXIMUM_BACKOFF_MULTIPLIER) (In reply to Nick Desaulniers [:\n] from comment #11) > (In reply to Gregory Szorc [:gps] from comment #7) > > ::: services/aitc/modules/manager.js > > @@ +57,5 @@ > > > > > > + // Used for testing. > > > + if (premadeClient) { > > > + self._client = premadeClient; > > > + return; > > > > So the callback doesn't get invoked if using the premade client? > > > > You should document the parameter in the docblock above the function > > definition. > > > > This conditional is only used for testing. Adding cb(true); within the > conditional doesn't affect the code in testing or production either way. > Maybe for future devs it might help so I'll add it. > > I added the documentation in my next patch for bug 760910, > https://bugzilla.mozilla.org/attachment.cgi?id=640699&action=diff#a/services/ > aitc/modules/manager.js_sec2, should I move part of the comment in here, > then modify my submitted patch for bug 760910, that way this patch has the > premadeClient info with it? Always commit related changes together. Please decide which of this or bug 760909 you will be committing first and have me review that.(In reply to Anant Narayanan [:anant] from comment #12) > (In reply to Gregory Szorc [:gps] from comment #9) > > > One of the unit tests tried to sync an app without first installing it, and > > > it caused a JS exception in this function. The case should actually never > > > occur (i.e. the manifest for an installed app missing), so this check > > > ensures that we catch that and fail appropriately. > > > > You shouldn't need to verify that DOMApplicationRegistry's API is behaving > > properly. > > > > If a test was doing something wrong, fix the test. > > This is just a standard case of "verify the return value of a function you > call", in this case it happens to be asynchronous. DOMApplicationRegistry is > not incorrect if it doesn't return a manifest, it means the app isn't > installed. > > This code is safer, the version without the check made it hard to debug. OK. I didn't realize this was a correct "return" value from DOMApplicationRegistry. In this case, no need to change anything.
(In reply to Gregory Szorc [:gps] from comment #13) > (In reply to Nick Desaulniers [:\n] from comment #10) > > (In reply to Gregory Szorc [:gps] from comment #7) > Good catch. So what about: > > this._backoffMultiplier = Math.min(this._backoffMultiplier * 2, > MAXIMUM_BACKOFF_MULTIPLIER) > The maximum backoff multiplier will change based on the maximum backoff (set by pref, with a const default), so it shouldn't be constant, or settable in a pref. By only incrementing it when necessary, we prevent a user/dev from setting the MAXIMUM_BACKOFF_MULTIPLIER to a number that is too low to ever reach the backoff.max (which is already a user pref). For instance Anant might set the MAXIMUM_BACKOFF_MULTIPLIER const to 5x, but then change his backoff.max pref to 2 hrs. If the backoff.inital pref is set to 1 min, the backoffs will look like this: failure | backoff | multiplier 1 | 2 min | 4 2 | 4 min | 5 3 | 5 min | 5 4 | 5 min | 5 ....etc. so it never reaches the backoff.max of 2hrs, even though he set it as a pref. I don't trust Anant's math skills, so I would prefer if we only increment it when necessary (as in not changing this), removing the dependency that can get messed up down the road. What do you think?
(In reply to Nick Desaulniers [:\n] from comment #14) > By only incrementing it when necessary, we prevent a user/dev from > setting the MAXIMUM_BACKOFF_MULTIPLIER to a number that is too low to ever > reach the backoff.max (which is already a user pref). We shouldn't be too worried about accidental values here. The main reason things are prefs is so if we royally screw things up, we can push a hotfix extension to all Firefox installs which adjusts the default values. If we don't use prefs, you need to roll a whole new Firefox release. All this talk about the backoff has got me thinking. Why do we even have a variable multiplier? Can we just always multiple the existing backoff by 2? The server *should* be sending an explicit backoff time anyway. So, the default value from the pref should only be needed in cases where the server is royally messed up. And, as I stated above, I'm not too worried about millions of clients changing the default initial backoff value to something ridiculously low.
As per an IRC conversation between myself, :gps, and :anant, the client will not start backing off until 3 consecutive failures (configurable through a pref, default to 3 through a const) where the initial backoff is 10 minutes (configurable through a pref, default to 10min through a const) and double for each failure. There should still be a maximum backoff.
Attached patch rev4 (obsolete) (deleted) — Splinter Review
Attachment #639367 - Attachment is obsolete: true
Attachment #641520 - Flags: review?(gps)
Comment on attachment 641520 [details] [diff] [review] rev4 Review of attachment 641520 [details] [diff] [review]: ----------------------------------------------------------------- Please move the server changes into a separate patch. Those can be committed separately. No sense waiting on committing ready code. For the next review of the backoff code, please ask :telliott for feedback. He will give managerial blessing of the backoff from the server side. This may involve him deferring to rfkelly, Ops, etc. ::: services/aitc/modules/client.js @@ +44,5 @@ > this._backoff = !!state.get("backoff", false); > + this._backoffTime = 0; > + this._consecutiveFailures = 0; > + this._maxFailures = state.get("requestFailureThreshold", > + DEFAULT_REQUEST_FAILURE_THRESHOLD); Why is _maxFailures a variable? AFAICT it is never mutated. Use the constant directly? @@ +387,5 @@ > val = req.response.headers["X-Backoff"]; > backoff = parseInt(val, 10); > this._log.warn("X-Backoff header was seen: " + val); > + } else if (successfulStatusCodes.indexOf(req.response.status) === -1) { > + // Bad status code Nit: Periods on all comments (yes, I thought this was silly when I first wrote patches too). It generally helps non-English speakers from understanding the code. @@ +400,5 @@ > + ); > + } else if (this._consecutiveFailures === this._maxFailures) { > + // initialize > + backoff = this._state.get("backoff.initial", DEFAULT_INITIAL_BACKOFF); > + } Please put this branch first since it will logically get executed first. @@ +403,5 @@ > + backoff = this._state.get("backoff.initial", DEFAULT_INITIAL_BACKOFF); > + } > + } else { > + // Good Status Code > + this._consecutiveFailures = 0; I wonder if there are any times where we could oscillate between good and bad server responses. If so, the backoff logic as coded would never set in. At the least, we should document this known issue. I would like more eyes on this. @@ +414,2 @@ > this._state.set("backoff", "" + (time + backoff)); > + this._backoffTime = backoff; This causes the backoff to be more than doubled because you are adding to backoff here. At the worst, we'd be multiplying backoff by 3.99... instead of 2. I'm wondering why we even need this fuzzing. AFAICT every client decides its own backoff interval based on when it issued a request. If the server is stupid enough to send a Retry-After or X-Backoff time that resolves to the same wall time for every client (e.g. the value decreases based on targeting a specific time in the future), then it is shooting itself in the foot and deserves to die from the thundering herd. Am I missing something here? ::: services/aitc/modules/manager.js @@ +59,5 @@ > > + // Used for testing. > + if (premadeClient) { > + self._client = premadeClient; > + cb(true); I know you were following convention, but why does the callback take true here? Our callback convention is for the first argument to be truthy only if an error occurred. Unless Anant or someone else says otherwise, can you create a part 1 to change this callback convention to pass a null or even nothing at all? ::: services/aitc/tests/unit/test_aitc_manager.js @@ +69,5 @@ > + }; > + > + let client = new AitcClient(token, new Preferences("services.aitc.client.")); > + > + return client; Nit: return new AitcClient(...) @@ +106,5 @@ > + lastRequestTime = Date.now(); > + } else { > + timeNow = Date.now(); > + timeDiff = timeNow - lastRequestTime; > + lastRequestTime = timeNow; Nit: assign these outside of if {}, get rid of first levels in nested if. ::: services/common/aitcserver.js @@ +235,5 @@ > + if (this.mockStatus.code && this.mockStatus.method) { > + this._respondWithMockStatus(request, response); > + return; > + } > + this._onRequest(); I'd just move this._onRequest() to the first thing the function does and remove the call from _respondWithMockStatus(). @@ +274,5 @@ > + if (this.mockStatus.code && this.mockStatus.method) { > + this._respondWithMockStatus(request, response); > + return; > + } > + this._onRequest(); Ditto. ::: services/common/tests/unit/test_aitc_server.js @@ +167,5 @@ > }); > }); > + > +add_test(function test_respond_with_mock_status() { > + let username = '123' Nit: Double quotes.
Attachment #641520 - Flags: review?(gps)
CC telliott for backoff implementation sign-off.
I think the algorithm is fine, and trying to come up with something more complex isn't worth it - we should be sending you backoff numbers when needed, and if you're getting flapping results, the backoff interval is large enough that it should give us reasonable time to figure out why. If you're really worried, have a success decrement the backoff counter by 1 until you're back to 0, but that could make recovery after a max-backoff event very slow. I will note that your success criteria is probably too narrow, and various 300 and 400 statuses don't require backoff. In particular, 403 (need ToS) is not something you should back off for.
Comment on attachment 641644 [details] [diff] [review] add mock request capability to AITC server Review of attachment 641644 [details] [diff] [review]: ----------------------------------------------------------------- I'll commit this for you.
Attachment #641644 - Flags: review?(gps) → review+
AITC server changes: https://hg.mozilla.org/services/services-central/rev/3eddb4087f85 Leaving bug open for the meat.
(In reply to Gregory Szorc [:gps] from comment #18) > ::: services/aitc/modules/client.js > @@ +44,5 @@ > > this._backoff = !!state.get("backoff", false); > > + this._backoffTime = 0; > > + this._consecutiveFailures = 0; > > + this._maxFailures = state.get("requestFailureThreshold", > > + DEFAULT_REQUEST_FAILURE_THRESHOLD); > > Why is _maxFailures a variable? AFAICT it is never mutated. Use the constant > directly? > We want it to be an instance variable of the Manager since state is passed to the Manager constructor function. If I make it a const local to the client.js file (a few lines above where it currently is), then I don't have access to state, which means that you could not use a pref to change the failure threshold.
(In reply to Gregory Szorc [:gps] from comment #18) > ::: services/common/aitcserver.js > @@ +235,5 @@ > > + if (this.mockStatus.code && this.mockStatus.method) { > > + this._respondWithMockStatus(request, response); > > + return; > > + } > > + this._onRequest(); > > I'd just move this._onRequest() to the first thing the function does and > remove the call from _respondWithMockStatus(). > > @@ +274,5 @@ > > + if (this.mockStatus.code && this.mockStatus.method) { > > + this._respondWithMockStatus(request, response); > > + return; > > + } > > + this._onRequest(); > > Ditto. > Unfortunately, this logic is needed. If a mockStatus is set, we want to set the headers, then call the onRequest callback AND return instead of continuing through _generalHandler or _userHandler. There may be a better way to do this, such as a if else, but then I'll have blame for all of the lines I indent.
(In reply to Nick Desaulniers [:\n] from comment #25) > Unfortunately, this logic is needed. If a mockStatus is set, we want to set > the headers, then call the onRequest callback AND return instead of > continuing through _generalHandler or _userHandler. OK. > There may be a better > way to do this, such as a if else, but then I'll have blame for all of the > lines I indent. You shouldn't generally worry about breaking blame.
Attachment #641520 - Attachment is obsolete: true
Attachment #642008 - Flags: review?(gps)
Attachment #642008 - Flags: feedback?(telliott)
Comment on attachment 642008 [details] [diff] [review] Added Client Driven Backoff to AITC Client Review of attachment 642008 [details] [diff] [review]: ----------------------------------------------------------------- Looks good to me.
Attachment #642008 - Flags: review?(gps) → review+
Comment on attachment 642008 [details] [diff] [review] Added Client Driven Backoff to AITC Client Still a little concerned about the fairly restrictive success status codes, but the logic looks correct to me.
Attachment #642008 - Flags: feedback?(telliott) → feedback+
Comment on attachment 642008 [details] [diff] [review] Added Client Driven Backoff to AITC Client Review of attachment 642008 [details] [diff] [review]: ----------------------------------------------------------------- ::: services/aitc/modules/client.js @@ +377,3 @@ > _setBackoff: function _setBackoff(req) { > let backoff = 0; > + let successfulStatusCodes = [200, 201, 204]; Need to add 304.
Whiteboard: [blocking-aitc+] → [blocking-aitc+], [fixed in services]
Is this possible to verify? If so, can you clarify what I should be watching out for?
Whiteboard: [blocking-aitc+], [fixed in services] → [blocking-aitc+], [fixed in services], [qa?]
(In reply to Jason Smith [:jsmith] from comment #32) > Is this possible to verify? If so, can you clarify what I should be watching > out for? The automated test verifies this functionality. It is not possible to manually verify without a custom AITC server setup.
Whiteboard: [blocking-aitc+], [fixed in services], [qa?] → [blocking-aitc+], [fixed in services], [qa-]
Status: ASSIGNED → RESOLVED
Closed: 12 years ago
Resolution: --- → FIXED
Whiteboard: [blocking-aitc+], [fixed in services], [qa-] → [blocking-aitc+][qa-]
Product: Web Apps → Web Apps Graveyard
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: