Setup BigQuery to accept test summary data
Categories
(Testing :: General, task, P3)
Tracking
(Not tracked)
People
(Reporter: ekyle, Unassigned)
References
(Blocks 2 open bugs)
Details
Setup tables and access in BigQuery to accept data from test info artifacts
Example:
https://firefoxci.taskcluster-artifacts.net/Wk9XNxyzQDKVd4CC9g2taQ/0/public/test-info-all-tests.json
Treeherder link for more:
https://treeherder.mozilla.org/#/jobs?repo=mozilla-central&searchStr=test-info
Updated•5 years ago
|
Reporter | ||
Comment 1•5 years ago
|
||
I think there are two ways of doing this:
- Package https://github.com/mozilla/push-to-bigquery and call it from the task that generates the test-info-all-tests. This way will provide the simplest code, but will have some bumps as I learn about bugs at scale.
- Setup a separate ETL process: Either in the ActiveData-ETL pipeline, or Treeherder ingestion, or other. This way will allow the existing code to say unchanged, and the short term deployment/scaling bugs are contained in the ETL pipeline. It gives us a more complicated dataflow, but allows us to change the ETL and redirect the dataflow without changing the task.
Both options will require the data be transformed into fixed-property json format: Where data is NOT in the property names.
Instead of
"Cloud Services::Firefox: Common": [
{
"failed runs": 0,
"skipped runs": 0,
"test": "services/common/tests/unit/test_async_chain.js",
"total run time, seconds": 6205.07,
"total runs": 3504
},
{
"failed runs": 0,
"skipped runs": 0,
"test": "services/common/tests/unit/test_async_foreach.js",
"total run time, seconds": 14888.07,
"total runs": 3501
},
we have
{
"component": "Cloud Services::Firefox: Common",
"failed runs": 0,
"skipped runs": 0,
"test": "services/common/tests/unit/test_async_chain.js",
"total run time, seconds": 6205.07,
"total runs": 3504
},
{
"component": "Cloud Services::Firefox: Common",
"failed runs": 0,
"skipped runs": 0,
"test": "services/common/tests/unit/test_async_foreach.js",
"total run time, seconds": 14888.07,
"total runs": 3501
},
Comment 2•5 years ago
|
||
I don't have much preference between your 2 options.
I can make the change to fixed-property json format (I don't like it much, but if it is needed, that's okay).
Reporter | ||
Comment 3•5 years ago
|
||
:gbrown hold off on the reformatting while I think about it. The transformation is usually done in the ETL pipeline, and if one is setup then it belongs there.
What other information should be attached to this data? If we are attaching data, like the task information, then the transformation definitly belongs in a separate ETL pipeline
Comment 4•5 years ago
|
||
OK, will hold off.
Your mention of "task information" reminds me that I want the dashboard to provide some sort of view of how data changes over time: the date/time the report was generated and the revision it was based on should be included. I wouldn't mind adding those to the .json file itself; let me know what you think. Otherwise, I can't think of any other information to attach.
Comment 5•3 years ago
|
||
The bug assignee didn't login in Bugzilla in the last 7 months.
:ahal, could you have a look please?
For more information, please visit auto_nag documentation.
Comment 6•3 years ago
|
||
This is still something we'd like, but if we go forward it should be driven by the data team. If someone from EE does end up writing a new ETL, we can file a new bug and follow their recommendations.
Description
•