Open Bug 1614541 Opened 5 years ago Updated 2 years ago

small number of libxul / omni.ja build ID mismatches

Categories

(Firefox :: General, defect, P3)

defect

Tracking

()

People

(Reporter: heycam, Assigned: rhelmer)

References

Details

Huh, interesting. It's quite a bit lower than "omnijar corrupted" (which is a signature check of both omni JARs):

https://telemetry.mozilla.org/new-pipeline/dist.html#!cumulative=0&end_date=2020-02-08&include_spill=0&keys=__none__!__none__!__none__&max_channel_version=nightly%252F74&measure=SCALARS_CORROBORATE.OMNIJAR_CORRUPTED&min_channel_version=null&processType=*&product=Firefox&sanitize=1&sort_by_value=0&sort_keys=submissions&start_date=2020-02-06&table=0&trim=1&use_submission_date=0

I'm curious to see the relationship between the two.

I don't see the "omnijar mismatch" scalar in BigQuery yet, following up on that with data folks. Once that's available, we can see if there are any other interesting correlations (presence/type of anti-virus, OS, number of crashes, etc.)

Assignee: nobody → rhelmer

I'm also interested to see if/how this changes as they ride the trains, the release population is pretty different from nightly.

OK so it's payload.processes.parent.scalars.corroborate_omnijar_mismatch in telemetry.main (I was looking in telemetry.main_summary), I'll see what I can dig up today.

Priority: -- → P1

I took a quick stab at using BigQuery's correlation function (with some help from tdsmith) to make it easier to try to correlate problems like broken omni JAR vs. wrong build ID in omni JAR. If this is correct then looks like an unsigned omni JAR is negatively correlated with an incorrect build ID, so they are likely different phenomena: https://sql.telemetry.mozilla.org/queries/68218/source

I went through the hassle of doing it this way so we can make it easier to now plug in different sorts of correlates:

(In reply to Robert Helmer [:rhelmer] from comment #1)

I don't see the "omnijar mismatch" scalar in BigQuery yet, following up on that with data folks. Once that's available, we can see if there are any other interesting correlations (presence/type of anti-virus, OS, number of crashes, etc.)

The theories I've been able to dream up so far around the mismatched build ID (setting aside the signature problem for now):

  1. our updater has a bug (or something is broken on the users machine) and sometimes the omni JAR doesn't get updated
  2. users (or malware) are copying an old omni JAR into place on purpose

However before trying to really figure out the cause it probably makes sense to measure the impact of this to see how much it's worth digging into. I think this is what we know so far:

Given the small number of crashes, and small numbers being reported by the telemetry on Nightly, I think it would be fine to wait until the telemetry hits Release before sinking a lot of time into investigating this.

Bugbug thinks this bug should belong to this component, but please revert this change in case of error.

Product: Firefox → Firefox Build System
Product: Firefox Build System → Firefox
Severity: normal → S4
Priority: P1 → P3
You need to log in before you can comment on or make changes to this bug.