Closed
Bug 1062120
Opened 10 years ago
Closed 8 years ago
Fail the whole POST request if we get a DB error
Categories
(Cloud Services Graveyard :: Server: Sync, defect, P3)
Cloud Services Graveyard
Server: Sync
Tracking
(Not tracked)
RESOLVED
FIXED
People
(Reporter: rfkelly, Unassigned)
References
Details
(Whiteboard: [qa?])
Attachments
(1 file, 1 obsolete file)
(deleted),
text/x-github-pull-request
|
Details |
If we get a db error like "lock wait timeout" during a batch POST request, the current behaviour is to return a "200 OK" but put every single bso in the "failed" list. This leads to confusing logs on both client and server, because the request appears to have succeeded.
Let's fix it so that a DB error just fails the whole request with a 503.
There's no chance of some objects begin successfully written while others are not, and AFAICT nothing actually depends on the current behaviour, it's just a hangover from an implementation detail of sync1.1
/cc Mark as a heads-up but I will be absolutely and completely shocked if this breaks anything in the client, since 503 is a perfectly valid response.
Reporter | ||
Comment 1•10 years ago
|
||
Posting the code so I don't lose track of it, but I'm going to find someone other than Toby to review this non-urgent, not-particularly-complex patch in order to continue spreading knowledge of the codebase. Which means waiting for people to come back from PTO.
Assignee: nobody → rfkelly
Updated•10 years ago
|
Whiteboard: [qa?]
Comment 2•10 years ago
|
||
Bear in mind that this'll put the client into an aborting state -- you'll supply a Retry-After with the 503 to avoid an error bar, the client will cancel the sync and back off, and (IIRC) it'll re-download whatever records it had already fetched from the failing engine next time it syncs.
The current behavior results in the least work from the client -- it'll track the failed records and upload them next time.
Reporter | ||
Comment 3•10 years ago
|
||
> Bear in mind that this'll put the client into an aborting state -- you'll supply a Retry-After with
> the 503 to avoid an error bar, the client will cancel the sync and back off
Hmmm...this is pretty much what we want from the server POV, because this error would already surface as a 503 on any other request. But I see your point re: extra work for the client.
For desktop, AFAICT from _sync(), _processIncoming() and _uploadOutgoing() in sync/modules/engines.js, the behavior is to:
* fetch all incoming records
* set lastSync timestamp so that we don't fetch them again
* upload outgoing records; for each batch:
* post it to the server
* set lastSync timestamp so that we don't fetch them next time
This seems OK to my naive eye, as it zips this.lastSync forward after each useful piece of work. (Which is of course racy as all hell, and should be using X-If-Unmodified-Since, but that's another story...)
I don't know about the android code, but it usually has nicer behaviour than desktop :-)
Reporter | ||
Comment 4•10 years ago
|
||
So we had a bit of inadvertent experience with this in prod, after the combo of Bug 1034377 and Bug 1051739 caused a bunch of additional 503s to show up on post requests. The client behavior has not been awesome, with increased user reports of sync errors.
I still think this bug is the Right Thing, but it should wait until we get client-side handling of 409s so that we only fail the sync on more substantial errors.
Reporter | ||
Updated•10 years ago
|
Reporter | ||
Comment 5•9 years ago
|
||
abandoning, adding :dcoates for context, although we can't really action this until we get better client support for continuing the sync via the depended-on bugs.
Assignee: rfkelly → nobody
Reporter | ||
Comment 6•9 years ago
|
||
Now that sync failures are error-bar-free and much less noisy in general, I think we should re-evaluate this. It will make for clearer logs on the client, and improved backoff response to help the server cope with load. :markh, what do you think?
Comment 7•9 years ago
|
||
comment 3 sounds correct, so I don't see a good reason to not do this, especially if it helps keep the servers sane and stable.
Updated•9 years ago
|
Flags: firefox-backlog+
Priority: -- → P2
Reporter | ||
Comment 8•9 years ago
|
||
I updated this diff for the latest state of master; this should prevent any 503s from hiding as "successful" requests in sync logs.
Attachment #8483303 -
Attachment is obsolete: true
Attachment #8728232 -
Flags: review?(dcoates)
Updated•8 years ago
|
Priority: P2 → P3
Comment 10•8 years ago
|
||
I'm sorry, I don't recall enough context to make sense of this. Please find someone else to review this.
Flags: needinfo?(dcoates)
Updated•8 years ago
|
Attachment #8728232 -
Flags: review?(dcoates) → review?
Reporter | ||
Comment 11•8 years ago
|
||
ni? myself to pick this up and find another reviewer
Flags: needinfo?(rfkelly)
Reporter | ||
Comment 12•8 years ago
|
||
Comment on attachment 8728232 [details]
PR to fail the entire POST request on a db error
:rtilder, since you've been around in the syncserver codebase recently, could you please review this? The idea here is to return a 503 error response when there's a DB error, rather than the current behaviour of returning a "200 OK" and listing all the items as "failed".
Flags: needinfo?(rfkelly)
Attachment #8728232 -
Flags: review? → review?(rtilder)
Reporter | ||
Comment 13•8 years ago
|
||
Comment on attachment 8728232 [details]
PR to fail the entire POST request on a db error
I merged this r=Natim over in https://github.com/mozilla-services/server-syncstorage/pull/29
Attachment #8728232 -
Flags: review?(rtilder)
Reporter | ||
Updated•8 years ago
|
Status: NEW → RESOLVED
Closed: 8 years ago
Resolution: --- → FIXED
Updated•2 years ago
|
Product: Cloud Services → Cloud Services Graveyard
You need to log in
before you can comment on or make changes to this bug.
Description
•