Open
Bug 1178227
Opened 9 years ago
Updated 6 years ago
[Meta] Simplify the Treeherder data model
Categories
(Tree Management :: Treeherder, defect, P3)
Tree Management
Treeherder
Tracking
(Not tracked)
NEW
People
(Reporter: emorley, Unassigned)
References
(Depends on 3 open bugs)
Details
(Keywords: meta)
We have a whole bunch of complexity to support use-cases we thought might occur, but haven't yet. Let's remove things we aren't using, since even if a use-case does present itself in the future, we may wish to implement differently when it actually comes to it.
Reporter | ||
Comment 1•9 years ago
|
||
Some more ideas for later:
* Remove pending_eta/running_eta from the jobs table (IMO we should have one reference data lookup for this in the UI and then use that to generate ETAs for each job dynamically in the UI)
* Sort out machine platform vs build platform (given bug 1056928) and decide if we really need both
* do something with option_collation
* remove the "tier" field from the job table (IMO it doesn't belong there; it's a visibility layer thing, not a job property)
* Checking field sizes
* Checking indexes (both missing and unnecessary)
Reporter | ||
Comment 2•9 years ago
|
||
* Remove the 'type' field from the result_set table
* Rename result_set to "pushes"?
* Replace revision_hash with 'revision' (since I don't think we're going to need to support the multi-repo use case any more, now that most things use a manifest to pin revisions)
Reporter | ||
Comment 3•9 years ago
|
||
* Consolidate the concepts of "project" and "repository"
* Actually start using the "product" field on the jobs table, or else remove it
Reporter | ||
Comment 4•9 years ago
|
||
* Remove the result_set aggregate_id field since it's not used anywhere, and the only reference to what it does is: "A id to use for aggregating result_sets. This is primarily used for supporting a github like pull request work flow but could also be used for any other type of grouping."
Reporter | ||
Comment 5•9 years ago
|
||
* machine_note table
Comment 6•9 years ago
|
||
after the objectstore is removed in bug 1140349:
1. delete the objectstore databases.
2. remove ``contenttype`` field from datasources table
Reporter | ||
Updated•9 years ago
|
Assignee: emorley → nobody
Comment 7•9 years ago
|
||
In looking at the data format for our pulse ingestion, I'm scrutinizing each field. Some of these stood out as perhaps not used or useful. Need a 50 cent word here... vestigial? :)
It seems we don't store:
build_url
machine VM status
We do store "product" (as opposed to "project") but we don't appear to use it in the UI. Do we need/want this?
Perhaps these are less about "data model" and more about data ingestion.
Flags: needinfo?(emorley)
Reporter | ||
Comment 8•9 years ago
|
||
(In reply to Cameron Dawson [:camd] from comment #7)
> It seems we don't store:
> build_url
> machine VM status
Let's get rid of them both. Also from the object here:
https://github.com/mozilla/treeherder/blob/master/treeherder/etl/buildbot.py#L981 (and related)
> We do store "product" (as opposed to "project") but we don't appear to use
> it in the UI. Do we need/want this?
So I think the reason for this was so we could differentiate between products (eg: "Firefox", "Firefox for Android", "B2G", ...) and projects (iirc that's the name we use for repos? eg "mozilla-central", "mozilla-inbound", ...).
The complication is that a "project" could have several "products" built from it (eg just look at the different rows on mozilla-central). Plus at the time we did the spec for Treeherder, we were thinking some builds might use multiple repos, so have two ways to slice the data (eg Thunderbird builds, where we could slice it by mozilla-central repo pushes or by comm-central pushes).
I think there is a use case for having "product" around - which is say the B2G team wanting to run queries against all B2G jobs across multiple repositories. So really product would be a grouping of job types - thereby reducing the overloading of "os_platform" - see bug 1060769.
I think we maybe need to decide our plan for this long term?
Flags: needinfo?(emorley)
Reporter | ||
Comment 9•9 years ago
|
||
(In reply to Ed Morley [:emorley] from comment #8)
> I think we maybe need to decide our plan for this long term?
And by that I mean us take a look at bug 1060769 comment 1 and see if that or something similar makes sense :-)
Reporter | ||
Comment 10•9 years ago
|
||
Remaining candidates:
* Remove running_eta from the jobs table (and look up dynamically)
* Sort out machine platform vs build platform (given bug 1056928) and decide if we really need both
* Do something with option_collection
* Remove the "tier" field from the job table (IMO it doesn't belong there; it's a visibility layer thing, not a job property)
* Remove the "type" field from the result_set table
* Consolidate the concepts of "project" and "repository"
* Actually start using the "product" field on the jobs table, or else remove it
+ checking field sizes and checking indexes (both missing and unnecessary)
Reporter | ||
Updated•7 years ago
|
Reporter | ||
Updated•7 years ago
|
Reporter | ||
Updated•7 years ago
|
Assignee: nobody → emorley
Reporter | ||
Updated•6 years ago
|
Assignee: emorley → nobody
You need to log in
before you can comment on or make changes to this bug.
Description
•