Closed
Bug 1034377
Opened 10 years ago
Closed 10 years ago
"Lock wait timeout exceeded" produces a 500 error; should be 409
Categories
(Cloud Services Graveyard :: Server: Sync, defect)
Cloud Services Graveyard
Server: Sync
Tracking
(Not tracked)
VERIFIED
FIXED
People
(Reporter: rfkelly, Assigned: rfkelly)
Details
(Whiteboard: [qa+])
Attachments
(1 file)
(deleted),
patch
|
telliott
:
review+
bobm
:
feedback+
|
Details | Diff | Splinter Review |
Traceback from loadtest:
Traceback (most recent call last): File "/media/ephemeral0/server-syncstorage/lib/python2.6/site-packages/SyncStorage-1.5.6-py2.6.egg/syncstorage/storage/sql/dbconnect.py", line 422, in report_backend_errors_wrapper return func(self, *args, **kwds) File "/media/ephemeral0/server-syncstorage/lib/python2.6/site-packages/SyncStorage-1.5.6-py2.6.egg/syncstorage/storage/sql/dbconnect.py", line 530, in execute return self._exec_with_cleanup(connection, query_str, **params) File "/media/ephemeral0/server-syncstorage/lib/python2.6/site-packages/mozsvc/metrics.py", line 183, in timed_func return func(*args, **kwds) File "/media/ephemeral0/server-syncstorage/lib/python2.6/site-packages/SyncStorage-1.5.6-py2.6.egg/syncstorage/storage/sql/dbconnect.py", line 550, in _exec_with_cleanup return connection.execute(sqltext(query_str), **params) File "/media/ephemeral0/server-syncstorage/lib/python2.6/site-packages/sqlalchemy/engine/base.py", line 720, in execute return meth(self, multiparams, params) File "/media/ephemeral0/server-syncstorage/lib/python2.6/site-packages/sqlalchemy/sql/elements.py", line 317, in _execute_on_connection return connection._execute_clauseelement(self, multiparams, params) File "/media/ephemeral0/server-syncstorage/lib/python2.6/site-packages/sqlalchemy/engine/base.py", line 817, in _execute_clauseelement compiled_sql, distilled_params File "/media/ephemeral0/server-syncstorage/lib/python2.6/site-packages/sqlalchemy/engine/base.py", line 947, in _execute_context context) File "/media/ephemeral0/server-syncstorage/lib/python2.6/site-packages/sqlalchemy/engine/base.py", line 1108, in _handle_dbapi_exception exc_info File "/media/ephemeral0/server-syncstorage/lib/python2.6/site-packages/sqlalchemy/util/compat.py", line 185, in raise_from_cause reraise(type(exception), exception, tb=exc_tb) File "/media/ephemeral0/server-syncstorage/lib/python2.6/site-packages/sqlalchemy/engine/base.py", line 940, in _execute_context context) File "/media/ephemeral0/server-syncstorage/lib/python2.6/site-packages/sqlalchemy/engine/default.py", line 435, in do_execute cursor.execute(statement, parameters) File "/media/ephemeral0/server-syncstorage/lib/python2.6/site-packages/pymysql/cursors.py", line 132, in execute result = self._query(query) File "/media/ephemeral0/server-syncstorage/lib/python2.6/site-packages/pymysql/cursors.py", line 271, in _query conn.query(q) File "/media/ephemeral0/server-syncstorage/lib/python2.6/site-packages/pymysql/connections.py", line 726, in query self._affected_rows = self._read_query_result(unbuffered=unbuffered) File "/media/ephemeral0/server-syncstorage/lib/python2.6/site-packages/pymysql/connections.py", line 861, in _read_query_result result.read() File "/media/ephemeral0/server-syncstorage/lib/python2.6/site-packages/pymysql/connections.py", line 1064, in read first_packet = self.connection._read_packet() File "/media/ephemeral0/server-syncstorage/lib/python2.6/site-packages/pymysql/connections.py", line 826, in _read_packet packet.check_error() File "/media/ephemeral0/server-syncstorage/lib/python2.6/site-packages/pymysql/connections.py", line 370, in check_error raise_mysql_exception(self._data) File "/media/ephemeral0/server-syncstorage/lib/python2.6/site-packages/pymysql/err.py", line 116, in raise_mysql_exception _check_mysql_exception(errinfo) File "/media/ephemeral0/server-syncstorage/lib/python2.6/site-packages/pymysql/err.py", line 112, in _check_mysql_exception raise InternalError(errno, errorvalue) InternalError: (InternalError) (1205, u'Lock wait timeout exceeded; try restarting transaction') '/* [queryName=LOCK_COLLECTION_WRITE, retry=1] */ SELECT last_modified FROM user_collections WHERE userid=%s AND collection=%s FOR UPDATE' (567217, 6)
This should have been caught and turned into a "409 Conflict" but it doesn't look like it was. Probably not catching the right error subclass.
Assignee | ||
Updated•10 years ago
|
Assignee: nobody → rfkelly
Assignee | ||
Updated•10 years ago
|
Summary: "Lock wait timeout exceeded" prouces a 500 error; should be 409 → "Lock wait timeout exceeded" produces a 500 error; should be 409
Updated•10 years ago
|
Whiteboard: [qa+]
Assignee | ||
Comment 1•10 years ago
|
||
AFAICT, the fact that this traceback starts in "report_backend_errors_wrapper" rather than in the top-level "catch_backend_errors_tween" means that it's logging the error before (correctly) turning it into a BackendError instance. So maybe this is actually being caught and reported properly, and it's just the logged traceback that is confusing.
I'll dig a little more into the behaviour here though.
Assignee | ||
Comment 2•10 years ago
|
||
Here are some improvements to our "409 Conflict" generation based on my invetigation for this traceback:
* Add an explicit message when logging such tracebacks, so that we can more easily distinguish "uncaught error" from "caught and logged".
* Treat "1213: deadlock found when trying to get lock" as a retryable locking-related error
* Move the "db locking error" => "ConflictError" mapping down onto the session object, so we correctly detect these if they show up on any query rather than just on the lock-taking query
These changes helped the "409 Conflict" response show up more reliably in my local testing on both sqlite and mysql (as opposed to these errors being reported as a generic "503 Service Unavailable").
Pinging :callahad for review in the interests of disseminating knowledge, and :bobm for feedback to sanity-check adding "1213" to the list of retryable errors.
Attachment #8457839 -
Flags: review?(dan.callahan)
Attachment #8457839 -
Flags: feedback?(bobm)
Comment 3•10 years ago
|
||
(In reply to Ryan Kelly [:rfkelly] from comment #2)
> Pinging :callahad for review in the interests of disseminating knowledge,
> and :bobm for feedback to sanity-check adding "1213" to the list of
> retryable errors.
Sounds good to me. Is there / will there be a retry limit?
Assignee | ||
Comment 4•10 years ago
|
||
> Sounds good to me. Is there / will there be a retry limit?
Yes, this flows into the existing retry logic which will retry exactly once, then fail back to the client with a 503 or 409, depending on the error.
Comment 5•10 years ago
|
||
Comment on attachment 8457839 [details] [diff] [review]
syncstorage-better-conflict-errors.diff
Sounds like a good strategy to me.
Attachment #8457839 -
Flags: feedback?(bobm) → feedback+
Assignee | ||
Comment 6•10 years ago
|
||
Comment on attachment 8457839 [details] [diff] [review]
syncstorage-better-conflict-errors.diff
Bumping review to :telliott, let's get this landed before the next round of deployments.
Attachment #8457839 -
Flags: review?(dan.callahan) → review?(telliott)
Updated•10 years ago
|
Attachment #8457839 -
Flags: review?(telliott) → review+
Comment 7•10 years ago
|
||
Bear in mind that desktop currently doesn't support 409 Conflict: Bug 959034.
Assignee | ||
Comment 8•10 years ago
|
||
> Bear in mind that desktop currently doesn't support 409 Conflict: Bug 959034.
Ugh, I had forgotten about the error-bar aspect of that. I'll ponder the implications before committing.
Assignee | ||
Comment 9•10 years ago
|
||
I committed this because it gives better internal retry handling, and because we already generate 409s under some circumstances anyway:
https://github.com/mozilla-services/server-syncstorage/commit/ee8349145405285bb0f4b78b28320c272fec156f
There's some risk that this might surface an error bar in situations where it wouldn't previously, but it seems acceptably small to me. We should separately consider the best path forward there.
Assignee | ||
Updated•10 years ago
|
Status: NEW → RESOLVED
Closed: 10 years ago
Resolution: --- → FIXED
Comment 10•10 years ago
|
||
OK. I checked my notes. I do not see any specific deployments coming up, so I will just hold this one.
Comment 11•10 years ago
|
||
(In reply to Ryan Kelly [:rfkelly] from comment #9)
> I committed this because it gives better internal retry handling, and
> because we already generate 409s under some circumstances anyway:
Should we add a graph with a 409 + UA break down?
Assignee | ||
Comment 12•10 years ago
|
||
> Should we add a graph with a 409 + UA break down?
Nah. See Bug 1051739 where I just disabled them entirely until we get proper client support.
Updated•2 years ago
|
Product: Cloud Services → Cloud Services Graveyard
You need to log in
before you can comment on or make changes to this bug.
Description
•