Closed Bug 1062120 Opened 10 years ago Closed 8 years ago

Fail the whole POST request if we get a DB error

Categories

(Cloud Services Graveyard :: Server: Sync, defect, P3)

defect

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: rfkelly, Unassigned)

References

Details

(Whiteboard: [qa?])

Attachments

(1 file, 1 obsolete file)

If we get a db error like "lock wait timeout" during a batch POST request, the current behaviour is to return a "200 OK" but put every single bso in the "failed" list. This leads to confusing logs on both client and server, because the request appears to have succeeded. Let's fix it so that a DB error just fails the whole request with a 503. There's no chance of some objects begin successfully written while others are not, and AFAICT nothing actually depends on the current behaviour, it's just a hangover from an implementation detail of sync1.1 /cc Mark as a heads-up but I will be absolutely and completely shocked if this breaks anything in the client, since 503 is a perfectly valid response.
Attached patch syncstorage-fail-whole-request.diff (obsolete) (deleted) — Splinter Review
Posting the code so I don't lose track of it, but I'm going to find someone other than Toby to review this non-urgent, not-particularly-complex patch in order to continue spreading knowledge of the codebase. Which means waiting for people to come back from PTO.
Assignee: nobody → rfkelly
Whiteboard: [qa?]
Bear in mind that this'll put the client into an aborting state -- you'll supply a Retry-After with the 503 to avoid an error bar, the client will cancel the sync and back off, and (IIRC) it'll re-download whatever records it had already fetched from the failing engine next time it syncs. The current behavior results in the least work from the client -- it'll track the failed records and upload them next time.
> Bear in mind that this'll put the client into an aborting state -- you'll supply a Retry-After with > the 503 to avoid an error bar, the client will cancel the sync and back off Hmmm...this is pretty much what we want from the server POV, because this error would already surface as a 503 on any other request. But I see your point re: extra work for the client. For desktop, AFAICT from _sync(), _processIncoming() and _uploadOutgoing() in sync/modules/engines.js, the behavior is to: * fetch all incoming records * set lastSync timestamp so that we don't fetch them again * upload outgoing records; for each batch: * post it to the server * set lastSync timestamp so that we don't fetch them next time This seems OK to my naive eye, as it zips this.lastSync forward after each useful piece of work. (Which is of course racy as all hell, and should be using X-If-Unmodified-Since, but that's another story...) I don't know about the android code, but it usually has nicer behaviour than desktop :-)
So we had a bit of inadvertent experience with this in prod, after the combo of Bug 1034377 and Bug 1051739 caused a bunch of additional 503s to show up on post requests. The client behavior has not been awesome, with increased user reports of sync errors. I still think this bug is the Right Thing, but it should wait until we get client-side handling of 409s so that we only fail the sync on more substantial errors.
Depends on: 959034, 959032
abandoning, adding :dcoates for context, although we can't really action this until we get better client support for continuing the sync via the depended-on bugs.
Assignee: rfkelly → nobody
Now that sync failures are error-bar-free and much less noisy in general, I think we should re-evaluate this. It will make for clearer logs on the client, and improved backoff response to help the server cope with load. :markh, what do you think?
comment 3 sounds correct, so I don't see a good reason to not do this, especially if it helps keep the servers sane and stable.
Flags: firefox-backlog+
Priority: -- → P2
I updated this diff for the latest state of master; this should prevent any 503s from hiding as "successful" requests in sync logs.
Attachment #8483303 - Attachment is obsolete: true
Attachment #8728232 - Flags: review?(dcoates)
Danny, are you still going to review this?
Flags: needinfo?(dcoates)
Priority: P2 → P3
I'm sorry, I don't recall enough context to make sense of this. Please find someone else to review this.
Flags: needinfo?(dcoates)
Attachment #8728232 - Flags: review?(dcoates) → review?
ni? myself to pick this up and find another reviewer
Flags: needinfo?(rfkelly)
Comment on attachment 8728232 [details] PR to fail the entire POST request on a db error :rtilder, since you've been around in the syncserver codebase recently, could you please review this? The idea here is to return a 503 error response when there's a DB error, rather than the current behaviour of returning a "200 OK" and listing all the items as "failed".
Flags: needinfo?(rfkelly)
Attachment #8728232 - Flags: review? → review?(rtilder)
Comment on attachment 8728232 [details] PR to fail the entire POST request on a db error I merged this r=Natim over in https://github.com/mozilla-services/server-syncstorage/pull/29
Attachment #8728232 - Flags: review?(rtilder)
Status: NEW → RESOLVED
Closed: 8 years ago
Resolution: --- → FIXED
Product: Cloud Services → Cloud Services Graveyard
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: