Closed Bug 1604525 Opened 5 years ago Closed 3 years ago

Setup BigQuery to accept test summary data

Categories

(Testing :: General, task, P3)

Version 3
task

Tracking

(Not tracked)

RESOLVED WONTFIX

People

(Reporter: ekyle, Unassigned)

References

(Blocks 2 open bugs)

Details

Blocks: 1604529
Priority: -- → P3

I think there are two ways of doing this:

  1. Package https://github.com/mozilla/push-to-bigquery and call it from the task that generates the test-info-all-tests. This way will provide the simplest code, but will have some bumps as I learn about bugs at scale.
  2. Setup a separate ETL process: Either in the ActiveData-ETL pipeline, or Treeherder ingestion, or other. This way will allow the existing code to say unchanged, and the short term deployment/scaling bugs are contained in the ETL pipeline. It gives us a more complicated dataflow, but allows us to change the ETL and redirect the dataflow without changing the task.

Both options will require the data be transformed into fixed-property json format: Where data is NOT in the property names.

Instead of

    "Cloud Services::Firefox: Common": [
      {
        "failed runs": 0, 
        "skipped runs": 0, 
        "test": "services/common/tests/unit/test_async_chain.js", 
        "total run time, seconds": 6205.07, 
        "total runs": 3504
      }, 
      {
        "failed runs": 0, 
        "skipped runs": 0, 
        "test": "services/common/tests/unit/test_async_foreach.js", 
        "total run time, seconds": 14888.07, 
        "total runs": 3501
      }, 

we have

      {
        "component": "Cloud Services::Firefox: Common",
        "failed runs": 0, 
        "skipped runs": 0, 
        "test": "services/common/tests/unit/test_async_chain.js", 
        "total run time, seconds": 6205.07, 
        "total runs": 3504
      }, 
      {
        "component": "Cloud Services::Firefox: Common",
        "failed runs": 0, 
        "skipped runs": 0, 
        "test": "services/common/tests/unit/test_async_foreach.js", 
        "total run time, seconds": 14888.07, 
        "total runs": 3501
      }, 
Flags: needinfo?(gbrown)

I don't have much preference between your 2 options.

I can make the change to fixed-property json format (I don't like it much, but if it is needed, that's okay).

Flags: needinfo?(gbrown)
Depends on: 1610632

:gbrown hold off on the reformatting while I think about it. The transformation is usually done in the ETL pipeline, and if one is setup then it belongs there.

What other information should be attached to this data? If we are attaching data, like the task information, then the transformation definitly belongs in a separate ETL pipeline

Flags: needinfo?(gbrown)

OK, will hold off.

Your mention of "task information" reminds me that I want the dashboard to provide some sort of view of how data changes over time: the date/time the report was generated and the revision it was based on should be included. I wouldn't mind adding those to the .json file itself; let me know what you think. Otherwise, I can't think of any other information to attach.

Flags: needinfo?(gbrown)

The bug assignee didn't login in Bugzilla in the last 7 months.
:ahal, could you have a look please?
For more information, please visit auto_nag documentation.

Assignee: klahnakoski → nobody
Flags: needinfo?(ahal)

This is still something we'd like, but if we go forward it should be driven by the data team. If someone from EE does end up writing a new ETL, we can file a new bug and follow their recommendations.

Status: NEW → RESOLVED
Closed: 3 years ago
Flags: needinfo?(ahal)
Resolution: --- → WONTFIX
You need to log in before you can comment on or make changes to this bug.