Open Bug 1343630 Opened 8 years ago Updated 5 years ago

Implement longer term solution to one-off MySQL utf8->utf8mb4 commit table conversion in bug 1115608

Categories

(Tree Management :: Treeherder, enhancement, P5)

enhancement

Tracking

(Not tracked)

People

(Reporter: emorley, Unassigned)

References

Details

In bug 1115608, production pushlog ingestion was failing due to an emoji having been used in a commit message (non-BMP aka astral unicode character). In the bug I: * Manually ran SQL against prototype/stage/prod to convert the `commit` table from utf8 to utf8mb4. * Landed a PR to update the DATABASES dict in settings.py to make Django set the client charset as utf8mb4 during the connection handshake itself. This is fine as a short term fix to unblock prod, however: 1) The SQL was manually run and not part of a migration, so now means the prod/... schema differs from Vagrant (ie partly regressing bug 1303763) 2) We need to make a longer-term decision whether to: - use utf8mb4 across all tables - use utf8mb4 across some tables (kind of annoying since only possible to do this via RunSQL migrations and not via a native Django feature) - don't use utf8mb4 at all, and just strip astral characters in the commit table after all (like we do for the failure_line table currently) Complications for switching are: * Time to run the conversion on large tables * Field/index length limits (though in the cases that appeared to be problematic I think we want to change schema anyway, eg job_details) I'm leaning towards switching all tables, and then we can also remove the failure_line workaround added by bug 1275425.
We should totally switch all tables non-UTF8-"UTF8" is just broken. Note nox's suggestion from https://bugzilla.mozilla.org/show_bug.cgi?id=1115608#c8 about first converting to binary to speed up the conversion process.
Assignee: nobody → emorley

I agree switching all tables makes sense.

For anyone working on this in the future, this will require:

  1. Adjusting the mysql.cnf file in the Treeherder repository used by Vagrant, so that the dev DB that gets created uses utf8mb4 instead of utf8
  2. Updating the Terraform config for RDS so with the equivalent change from (1): https://github.com/mozilla-platform-ops/devservices-aws/blob/master/treeherder/rds.tf
  3. Actually updating the existing dev/stage/prod DB schemas (since the change in (2) will only apply to new tables) using the techniques described here:
Assignee: emorley → nobody
Summary: Implement longer term solution to one-off utf8->utf8mb4 commit table conversion in bug 1115608 → Implement longer term solution to one-off MySQL utf8->utf8mb4 commit table conversion in bug 1115608

We will see if this becomes a higher priority when we explore a different backend

Priority: P3 → P5
You need to log in before you can comment on or make changes to this bug.