Closed Bug 1062120 Opened 10 years ago Closed 8 years ago

Fail the whole POST request if we get a DB error

Tracking

(Not tracked)

Status:

RESOLVED FIXED

People

(Reporter: rfkelly, Unassigned)

References

Details

(Whiteboard: [qa?])

Attachments

(1 file, 1 obsolete file)

syncstorage-fail-whole-request.diff 10 years ago Ryan Kelly [:rfkelly] (deleted), patch		Details \| Diff \| Splinter Review
PR to fail the entire POST request on a db error 9 years ago Ryan Kelly [:rfkelly] (deleted), text/x-github-pull-request		Details

Ryan Kelly [:rfkelly]

Reporter

Description

•

10 years ago

If we get a db error like "lock wait timeout" during a batch POST request, the current behaviour is to return a "200 OK" but put every single bso in the "failed" list. This leads to confusing logs on both client and server, because the request appears to have succeeded. Let's fix it so that a DB error just fails the whole request with a 503. There's no chance of some objects begin successfully written while others are not, and AFAICT nothing actually depends on the current behaviour, it's just a hangover from an implementation detail of sync1.1 /cc Mark as a heads-up but I will be absolutely and completely shocked if this breaks anything in the client, since 503 is a perfectly valid response.

Ryan Kelly [:rfkelly]

Reporter

Comment 1

•

10 years ago

Attached patch syncstorage-fail-whole-request.diff (obsolete) (deleted) — Details — Splinter Review

Posting the code so I don't lose track of it, but I'm going to find someone other than Toby to review this non-urgent, not-particularly-complex patch in order to continue spreading knowledge of the codebase. Which means waiting for people to come back from PTO.

Assignee: nobody → rfkelly

James Bonacci [:jbonacci]

Updated

•

10 years ago

Whiteboard: [qa?]

Richard Newman [:rnewman]

Comment 2

•

10 years ago

Bear in mind that this'll put the client into an aborting state -- you'll supply a Retry-After with the 503 to avoid an error bar, the client will cancel the sync and back off, and (IIRC) it'll re-download whatever records it had already fetched from the failing engine next time it syncs. The current behavior results in the least work from the client -- it'll track the failed records and upload them next time.

Ryan Kelly [:rfkelly]

Reporter

Comment 3

•

10 years ago

> Bear in mind that this'll put the client into an aborting state -- you'll supply a Retry-After with > the 503 to avoid an error bar, the client will cancel the sync and back off Hmmm...this is pretty much what we want from the server POV, because this error would already surface as a 503 on any other request. But I see your point re: extra work for the client. For desktop, AFAICT from _sync(), _processIncoming() and _uploadOutgoing() in sync/modules/engines.js, the behavior is to: * fetch all incoming records * set lastSync timestamp so that we don't fetch them again * upload outgoing records; for each batch: * post it to the server * set lastSync timestamp so that we don't fetch them next time This seems OK to my naive eye, as it zips this.lastSync forward after each useful piece of work. (Which is of course racy as all hell, and should be using X-If-Unmodified-Since, but that's another story...) I don't know about the android code, but it usually has nicer behaviour than desktop :-)

Ryan Kelly [:rfkelly]

Reporter

Comment 4

•

10 years ago

So we had a bit of inadvertent experience with this in prod, after the combo of Bug 1034377 and Bug 1051739 caused a bunch of additional 503s to show up on post requests. The client behavior has not been awesome, with increased user reports of sync errors. I still think this bug is the Right Thing, but it should wait until we get client-side handling of 409s so that we only fail the sync on more substantial errors.

Ryan Kelly [:rfkelly]

Reporter

Updated

•

10 years ago

Depends on: 959034, 959032

Ryan Kelly [:rfkelly]

Reporter

Comment 5

•

9 years ago

abandoning, adding :dcoates for context, although we can't really action this until we get better client support for continuing the sync via the depended-on bugs.

Assignee: rfkelly → nobody

Ryan Kelly [:rfkelly]

Reporter

Comment 6

•

9 years ago

Now that sync failures are error-bar-free and much less noisy in general, I think we should re-evaluate this. It will make for clearer logs on the client, and improved backoff response to help the server cope with load. :markh, what do you think?

Mark Hammond [:markh] [:mhammond]

Comment 7

•

9 years ago

comment 3 sounds correct, so I don't see a good reason to not do this, especially if it helps keep the servers sane and stable.

Chris Karlof [:ckarlof]

Updated

•

9 years ago

Flags: firefox-backlog+

Priority: -- → P2

Ryan Kelly [:rfkelly]

Reporter

Comment 8

•

9 years ago

Attached file PR to fail the entire POST request on a db error (deleted) — Details

I updated this diff for the latest state of master; this should prevent any 503s from hiding as "successful" requests in sync logs.

Attachment #8483303 - Attachment is obsolete: true

Attachment #8728232 - Flags: review?(dcoates)

Chris Karlof [:ckarlof]

Comment 9

•

8 years ago

Danny, are you still going to review this?

Flags: needinfo?(dcoates)

Chris Karlof [:ckarlof]

Updated

•

8 years ago

Priority: P2 → P3

Danny Coates [:dcoates]

Comment 10

•

8 years ago

I'm sorry, I don't recall enough context to make sense of this. Please find someone else to review this.

Flags: needinfo?(dcoates)

Danny Coates [:dcoates]

Updated

•

8 years ago

Attachment #8728232 - Flags: review?(dcoates) → review?

Ryan Kelly [:rfkelly]

Reporter

Comment 11

•

8 years ago

ni? myself to pick this up and find another reviewer

Flags: needinfo?(rfkelly)

Ryan Kelly [:rfkelly]

Reporter

Comment 12

•

8 years ago

Comment on attachment 8728232 [details] PR to fail the entire POST request on a db error :rtilder, since you've been around in the syncserver codebase recently, could you please review this? The idea here is to return a 503 error response when there's a DB error, rather than the current behaviour of returning a "200 OK" and listing all the items as "failed".

Flags: needinfo?(rfkelly)

Attachment #8728232 - Flags: review? → review?(rtilder)

Ryan Kelly [:rfkelly]

Reporter

Comment 13

•

8 years ago

Comment on attachment 8728232 [details] PR to fail the entire POST request on a db error I merged this r=Natim over in https://github.com/mozilla-services/server-syncstorage/pull/29

Attachment #8728232 - Flags: review?(rtilder)

Ryan Kelly [:rfkelly]

Reporter

Updated

•

8 years ago

Status: NEW → RESOLVED

Closed: 8 years ago

Resolution: --- → FIXED

BMO Automation

Updated

•

2 years ago

Product: Cloud Services → Cloud Services Graveyard

You need to log in before you can comment on or make changes to this bug.