Open Bug 1178227 Opened 9 years ago Updated 6 years ago

[Meta] Simplify the Treeherder data model

Categories

(Tree Management :: Treeherder, defect, P3)

defect

Tracking

(Not tracked)

People

(Reporter: emorley, Unassigned)

References

(Depends on 3 open bugs)

Details

(Keywords: meta)

We have a whole bunch of complexity to support use-cases we thought might occur, but haven't yet. Let's remove things we aren't using, since even if a use-case does present itself in the future, we may wish to implement differently when it actually comes to it.
Keywords: meta
Depends on: 1178232
Depends on: 1178234
Depends on: 1178389
Depends on: 1178395
Some more ideas for later: * Remove pending_eta/running_eta from the jobs table (IMO we should have one reference data lookup for this in the UI and then use that to generate ETAs for each job dynamically in the UI) * Sort out machine platform vs build platform (given bug 1056928) and decide if we really need both * do something with option_collation * remove the "tier" field from the job table (IMO it doesn't belong there; it's a visibility layer thing, not a job property) * Checking field sizes * Checking indexes (both missing and unnecessary)
* Remove the 'type' field from the result_set table * Rename result_set to "pushes"? * Replace revision_hash with 'revision' (since I don't think we're going to need to support the multi-repo use case any more, now that most things use a manifest to pin revisions)
* Consolidate the concepts of "project" and "repository" * Actually start using the "product" field on the jobs table, or else remove it
* Remove the result_set aggregate_id field since it's not used anywhere, and the only reference to what it does is: "A id to use for aggregating result_sets. This is primarily used for supporting a github like pull request work flow but could also be used for any other type of grouping."
Depends on: 1178641
* machine_note table
after the objectstore is removed in bug 1140349: 1. delete the objectstore databases. 2. remove ``contenttype`` field from datasources table
Depends on: 1140349
Depends on: 1178852
Depends on: 1178868
Depends on: 1179011
Depends on: 1179043
Depends on: 1179203
Depends on: 1179214
Depends on: 1181572
Depends on: 1182455
Depends on: 1183137
Depends on: 1185030
Depends on: 1190343
Assignee: emorley → nobody
Depends on: 1198786
In looking at the data format for our pulse ingestion, I'm scrutinizing each field. Some of these stood out as perhaps not used or useful. Need a 50 cent word here... vestigial? :) It seems we don't store: build_url machine VM status We do store "product" (as opposed to "project") but we don't appear to use it in the UI. Do we need/want this? Perhaps these are less about "data model" and more about data ingestion.
Flags: needinfo?(emorley)
(In reply to Cameron Dawson [:camd] from comment #7) > It seems we don't store: > build_url > machine VM status Let's get rid of them both. Also from the object here: https://github.com/mozilla/treeherder/blob/master/treeherder/etl/buildbot.py#L981 (and related) > We do store "product" (as opposed to "project") but we don't appear to use > it in the UI. Do we need/want this? So I think the reason for this was so we could differentiate between products (eg: "Firefox", "Firefox for Android", "B2G", ...) and projects (iirc that's the name we use for repos? eg "mozilla-central", "mozilla-inbound", ...). The complication is that a "project" could have several "products" built from it (eg just look at the different rows on mozilla-central). Plus at the time we did the spec for Treeherder, we were thinking some builds might use multiple repos, so have two ways to slice the data (eg Thunderbird builds, where we could slice it by mozilla-central repo pushes or by comm-central pushes). I think there is a use case for having "product" around - which is say the B2G team wanting to run queries against all B2G jobs across multiple repositories. So really product would be a grouping of job types - thereby reducing the overloading of "os_platform" - see bug 1060769. I think we maybe need to decide our plan for this long term?
Flags: needinfo?(emorley)
(In reply to Ed Morley [:emorley] from comment #8) > I think we maybe need to decide our plan for this long term? And by that I mean us take a look at bug 1060769 comment 1 and see if that or something similar makes sense :-)
Depends on: 1202626
Depends on: 1198536
Depends on: 1211253
Depends on: 1211715
Depends on: 1211836
Depends on: 1199364
Remaining candidates: * Remove running_eta from the jobs table (and look up dynamically) * Sort out machine platform vs build platform (given bug 1056928) and decide if we really need both * Do something with option_collection * Remove the "tier" field from the job table (IMO it doesn't belong there; it's a visibility layer thing, not a job property) * Remove the "type" field from the result_set table * Consolidate the concepts of "project" and "repository" * Actually start using the "product" field on the jobs table, or else remove it + checking field sizes and checking indexes (both missing and unnecessary)
Depends on: 1196764
Depends on: 1328985, 1387640
Depends on: 1346565
Depends on: 1306707, 1257602
Assignee: nobody → emorley
Depends on: 1419965
Depends on: 1402992
Depends on: 1416861
Assignee: emorley → nobody
Depends on: 1469569
Depends on: 1470381
Depends on: 1472680
Depends on: 1482375
You need to log in before you can comment on or make changes to this bug.