Closed
Bug 1275425
Opened 8 years ago
Closed 8 years ago
OperationalError(1366, "Incorrect string value: '\\xF0\\x9D\\x90\\x80\\xF0\\x9D...' for column 'message' at row 1")
Categories
(Tree Management :: Treeherder: Data Ingestion, defect, P1)
Tree Management
Treeherder: Data Ingestion
Tracking
(Not tracked)
RESOLVED
FIXED
People
(Reporter: emorley, Assigned: jgraham)
References
(Blocks 1 open bug)
Details
Attachments
(1 file)
<jgraham> In [3]: print "\xF0\x9D\x90\x80".decode("utf8")
<jgraham>
Flags: needinfo?(james)
Reporter | ||
Comment 1•8 years ago
|
||
Eugh the unicode broke Bugzilla.
<jgraham> In [3]: print "\xF0\x9D\x90\x80".decode("utf8")
<jgraham> <REDACTED since it breaks bugzilla>
<jgraham> Well that didn't show up so well here but it's totally a legit character
<jgraham> So this is just the usual MySQL terribleness
<jgraham> We need to switch that field to be utf8mb4
<jgraham> https://mathiasbynens.be/notes/mysql-utf8mb4 is the canonical write up. But https://code.djangoproject.com/ticket/18392 makes it sound like django itself doesn't support this correctly, or something
<jgraham> Which is pretty unbelievable
Things we can do:
1) Most urgent: Not perform a celery retry for errors of this type, since we know it's going to fail. This would at least stop the exception spam and extra load/backlog that ensues.
2) Filter the messages in the meantime
3) File another bug for the long term fix of changing to utf8mb4
https://rpm.newrelic.com/accounts/677903/applications/4180461/filterable_errors#/show/4f5109-f43cfb99-2203-11e6-b947-b82a72d22a14/stack_trace?top_facet=transactionUiName&primary_facet=error.class&barchart=barchart&_k=f6psad
Comment 2•8 years ago
|
||
Assignee | ||
Updated•8 years ago
|
Flags: needinfo?(james)
Attachment #8756424 -
Flags: review?(emorley)
Reporter | ||
Updated•8 years ago
|
Attachment #8756424 -
Flags: review?(emorley) → review+
Reporter | ||
Updated•8 years ago
|
Assignee: nobody → james
Comment 3•8 years ago
|
||
Commits pushed to master at https://github.com/mozilla/treeherder
https://github.com/mozilla/treeherder/commit/18d0fdc8aab8c18d9830e03269efc33342b6e5e7
Bug 1275425 - Hacky workaround for issues storing astral characters.
Test names, messages, etc. may contain UTF8 characters from beyond the
Basic Multilingual Plane ("astral" characters). Unfortunately MySQL's
"utf8" character set is nothing of the sort and will only store a
maximum of three bytes per character, thus restricting it to BMP
characters. The correct fix to this is to switch to the utf8mb4
character set. Since such a change is somewhat involved, however, we
address the immediate problem with a hack.
When storing failure lines, if the operation fails for character set
related reasons, try again with any non-BMP characters replaced by a
marker of the form <U+codepoint> e.g. <U+10FFFF>.
Note further that whether or not MySQL fails here or silently replaces
each byte of the original character with a U+FFFD replacement character
depends on the value of the sql_mode setting. If this is set to
STRICT_ALL_TABLES, we get an error, otherwise silent data
loss. Therefore it is important this setting is consistent across all
environments.
https://github.com/mozilla/treeherder/commit/c3128c01fb65a60b102ba3779290e3c2c5043a42
Merge pull request #1512 from mozilla/utf8_astral_hack
Bug 1275425 - Hacky workaround for issues storing astral characters.
Reporter | ||
Updated•8 years ago
|
Status: NEW → RESOLVED
Closed: 8 years ago
Resolution: --- → FIXED
You need to log in
before you can comment on or make changes to this bug.
Description
•