Closed Bug 1333234 Opened 8 years ago Closed 8 years ago

L10n Routing on Aurora is too large, breaks amqp

Categories

(Firefox Build System :: Task Configuration, task)

task
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: Callek, Assigned: Callek)

References

Details

Attachments

(1 file)

So, as we initially thought in Bug 1323792 that bumping the route limits would not affect much overall, it turns out we have hit a hard amqp limit. The overall message header size is configured to be a max of ~ 4kb, so is the route lengths, + some other stuff. Since the l10n tasks are stable on central (for now) and breaks the decision task on aurora, we'll up the chunking on aurora, while simultaneously investigating/investing in an alternate way to define the used index's. This is currently blocking aurora linux/linux64/android nightlies.
Having used a desktop nightlies parameters file to generate a full taskgraph on central and aurora I concat'd all the routes into one string and took the length (using jq) and found the following lengths of routes: == Central == cat ../jobs_test1.json | jq '[. | to_entries[] | select(.key) | .value =reduce .value.task.routes[] as $item (0; . + ( $item | length ) ) ] | from_entries' | grep "nightly-l10n\|build.*nightly" | grep -v beetmover "build-android-api-15-nightly/opt": 548, "build-android-api-15-nightly/opt-upload-symbols": 156, "build-android-x86-nightly/opt": 536, "build-android-x86-nightly/opt-upload-symbols": 156, "build-linux-nightly/opt": 516, "build-linux-nightly/opt-upload-symbols": 156, "build-linux64-nightly/opt": 524, "build-linux64-nightly/opt-upload-symbols": 156, "nightly-l10n-android-api-15-nightly-1/opt": 2162, "nightly-l10n-android-api-15-nightly-2/opt": 2162, "nightly-l10n-android-api-15-nightly-3/opt": 2144, "nightly-l10n-android-api-15-nightly-4/opt": 2171, "nightly-l10n-android-api-15-nightly-5/opt": 2162, "nightly-l10n-android-api-15-nightly-6/opt": 2171, "nightly-l10n-linux-nightly-1/opt": 1997, "nightly-l10n-linux-nightly-2/opt": 2012, "nightly-l10n-linux-nightly-3/opt": 1976, "nightly-l10n-linux-nightly-4/opt": 1734, "nightly-l10n-linux-nightly-5/opt": 1743, "nightly-l10n-linux-nightly-6/opt": 1734, "nightly-l10n-linux64-nightly-1/opt": 2039, "nightly-l10n-linux64-nightly-2/opt": 2054, "nightly-l10n-linux64-nightly-3/opt": 2018, "nightly-l10n-linux64-nightly-4/opt": 1770, "nightly-l10n-linux64-nightly-5/opt": 1779, "nightly-l10n-linux64-nightly-6/opt": 1770, == Aurora == cat ../jobs_test1.json | jq '[. | to_entries[] | select(.key) | .value = reduce .value.task.routes[] as $item (0; . + ( $item | length ) ) ] | from_entries' | grep "nightly-l10n\|build.*nightly" | grep -v beetmover "build-android-api-15-nightly/opt": 542, "build-android-api-15-nightly/opt-upload-symbols": 154, "build-android-x86-nightly/opt": 530, "build-android-x86-nightly/opt-upload-symbols": 154, "build-linux-nightly/opt": 510, "build-linux-nightly/opt-upload-symbols": 154, "build-linux64-nightly/opt": 518, "build-linux64-nightly/opt-upload-symbols": 154, "nightly-l10n-android-api-15-nightly-1/opt": 4393, "nightly-l10n-android-api-15-nightly-2/opt": 4426, "nightly-l10n-android-api-15-nightly-3/opt": 4417, "nightly-l10n-android-api-15-nightly-4/opt": 4384, "nightly-l10n-android-api-15-nightly-5/opt": 4417, "nightly-l10n-android-api-15-nightly-6/opt": 4118, "nightly-l10n-linux-nightly-1/opt": 4293, "nightly-l10n-linux-nightly-2/opt": 4323, "nightly-l10n-linux-nightly-3/opt": 4314, "nightly-l10n-linux-nightly-4/opt": 4278, "nightly-l10n-linux-nightly-5/opt": 4320, "nightly-l10n-linux-nightly-6/opt": 4296, "nightly-l10n-linux64-nightly-1/opt": 4389, "nightly-l10n-linux64-nightly-2/opt": 4419, "nightly-l10n-linux64-nightly-3/opt": 4410, "nightly-l10n-linux64-nightly-4/opt": 4374, "nightly-l10n-linux64-nightly-5/opt": 4416, "nightly-l10n-linux64-nightly-6/opt": 4392,
(In reply to Justin Wood (:Callek) from comment #1) > Created attachment 8829684 [details] > Bug 1333234 - L10n Routing on Aurora is too large. To save you the trouble, after the patch this route length is: cat ../jobs_test1.json | jq '[. | to_entries[] | select(.key) | .value = reduce .value.task.routes[] as $item (0; . + ( $item | length ) ) ] | from_entries' | grep "nightly-l10n\|build.*nightly" | grep -v beetmover "build-android-api-15-nightly/opt": 542, "build-android-api-15-nightly/opt-upload-symbols": 154, "build-android-x86-nightly/opt": 530, "build-android-x86-nightly/opt-upload-symbols": 154, "build-linux-nightly/opt": 510, "build-linux-nightly/opt-upload-symbols": 154, "build-linux64-nightly/opt": 518, "build-linux64-nightly/opt-upload-symbols": 154, "nightly-l10n-android-api-15-nightly-1/opt": 2704, "nightly-l10n-android-api-15-nightly-10/opt": 2423, "nightly-l10n-android-api-15-nightly-2/opt": 2698, "nightly-l10n-android-api-15-nightly-3/opt": 2728, "nightly-l10n-android-api-15-nightly-4/opt": 2710, "nightly-l10n-android-api-15-nightly-5/opt": 2704, "nightly-l10n-android-api-15-nightly-6/opt": 2686, "nightly-l10n-android-api-15-nightly-7/opt": 2713, "nightly-l10n-android-api-15-nightly-8/opt": 2710, "nightly-l10n-android-api-15-nightly-9/opt": 2695, "nightly-l10n-linux-nightly-1/opt": 2748, "nightly-l10n-linux-nightly-10/opt": 2485, "nightly-l10n-linux-nightly-2/opt": 2730, "nightly-l10n-linux-nightly-3/opt": 2778, "nightly-l10n-linux-nightly-4/opt": 2751, "nightly-l10n-linux-nightly-5/opt": 2745, "nightly-l10n-linux-nightly-6/opt": 2736, "nightly-l10n-linux-nightly-7/opt": 2494, "nightly-l10n-linux-nightly-8/opt": 2494, "nightly-l10n-linux-nightly-9/opt": 2479, "nightly-l10n-linux64-nightly-1/opt": 2808, "nightly-l10n-linux64-nightly-10/opt": 2539, "nightly-l10n-linux64-nightly-2/opt": 2790, "nightly-l10n-linux64-nightly-3/opt": 2838, "nightly-l10n-linux64-nightly-4/opt": 2811, "nightly-l10n-linux64-nightly-5/opt": 2805, "nightly-l10n-linux64-nightly-6/opt": 2796, "nightly-l10n-linux64-nightly-7/opt": 2548, "nightly-l10n-linux64-nightly-8/opt": 2548, "nightly-l10n-linux64-nightly-9/opt": 2533,
Assignee: nobody → bugspam.Callek
Comment on attachment 8829684 [details] Bug 1333234 - L10n Routing on Aurora is too large. https://reviewboard.mozilla.org/r/106686/#review107858 As a followup if you have time, it might be nice to log the maximum of those values in the decision task (in the task-creation loop). Then if we run into this again, some log parsing will give us a nice threshold value rather than the gusstimates we have now.
Attachment #8829684 - Flags: review?(dustin) → review+
(In reply to Dustin J. Mitchell [:dustin] from comment #4) > Comment on attachment 8829684 [details] > Bug 1333234 - L10n Routing on Aurora is too large. > > https://reviewboard.mozilla.org/r/106686/#review107858 > > As a followup if you have time, it might be nice to log the maximum of those > values in the decision task (in the task-creation loop). Then if we run > into this again, some log parsing will give us a nice threshold value rather > than the gusstimates we have now. We could run this across a range of the full .json's produced by decision tasks probably easier than we could scan/skim logs for a value. Especially rather than clogging up terminals with a cryptic debugging line. Maybe we use grafana or something similar to graph the max instead? And make an estimate of an error threshold (say 3.5k or 4k)? https://hg.mozilla.org/releases/mozilla-aurora/rev/c5a33bbf8cb46bf40fc4d9f2b619189e7f377230
That's a great point regarding analyzing .json files after the fact. We don't (yet) have a way to create statistics in a task, so I don't think we can gather it that easily, just yet. Hopefully we won't need to!
The followon (better) solution is being worked on in https://bugzil.la/1333255 where jonas is working on a way to actually index all these routes on the tasks, without needing the extra chunking.
Status: NEW → RESOLVED
Closed: 8 years ago
Resolution: --- → FIXED
Like asked on IRC by :Callek and now that the merge day happed, I relanded this patch on Aurora at: https://hg.mozilla.org/releases/mozilla-aurora/rev/577083e852674484f8064f45a9b99cf13e1f9b6f
Product: TaskCluster → Firefox Build System
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: