Closed
Bug 867342
Opened 12 years ago
Closed 11 years ago
crash reason missing since 2013-04-09
Categories
(Socorro :: General, task)
Socorro
General
Tracking
(Not tracked)
RESOLVED
FIXED
People
(Reporter: rhelmer, Assigned: rhelmer)
References
Details
Attachments
(1 file, 2 obsolete files)
(deleted),
patch
|
Details | Diff | Splinter Review |
This is breaking correlations - https://bugzilla.mozilla.org/show_bug.cgi?id=836671#c47
Comment 1•12 years ago
|
||
scanning through the reports table in production, I see no lack of 'reason'. There are some records that have no value for 'reason', but those are all directly associated with failures in MDSW. Where are you seeing 'reason' missing?
Assignee | ||
Comment 2•12 years ago
|
||
(In reply to K Lars Lohn [:lars] [:klohn] from comment #1)
> scanning through the reports table in production, I see no lack of 'reason'.
> There are some records that have no value for 'reason', but those are all
> directly associated with failures in MDSW. Where are you seeing 'reason'
> missing?
I had the same experience looking at the reports table, the reason I suspect that this is what's happening is because I am seeing this from debugging I've added to the correlation script:
Traceback (most recent call last):
File "./crash-data-tools/per-crash-interesting-modules.py", line 122, in <module>
signame = signame + "|" + crash["reason"]
TypeError: coercing to Unicode: need string or buffer, NoneType found
[2013-04-20 19:01:54] per-crash-interesting-modules.py > /tmp/tmp.5BDsxS1n4l/20130420_Firefox_20.0.1-interesting-modules-with-versions.txt
rhelmer debug: crash["reason"] is None
I'll go ahead and get it to print out crash ID as well, it may be that there's only a subset of these. In any case, this started appearing only since 2013-04-09
Assignee | ||
Comment 3•12 years ago
|
||
Here is an example of a crash with an empty reason field:
https://crash-stats.mozilla.com/report/index/67fd2ffd-8d97-415b-a4d2-0a4a32130414
Comment 4•12 years ago
|
||
notice that MDSW failed on that crash. The processor cannot supply a 'reason' if MDSW failed...
Comment 5•12 years ago
|
||
also note that this problem apparently begins two days _before_ the new processor went into production. What changed on the 9th?
Comment 6•12 years ago
|
||
I had a look at breakpad - there were two bugfixes on April 9th:
http://code.google.com/p/google-breakpad/source/detail?r=1145
http://code.google.com/p/google-breakpad/source/detail?r=1146
Comment 7•12 years ago
|
||
I've looked at the same crash signature from the 1st week of April vs the 2nd week of April. MDSW succeeds in the first week and fails in the second week on the same crash...
Did we get a new MDSW on the 9th?
Comment 8•12 years ago
|
||
failing MDSW: 5daa2a7e-666d-4d4e-97b7-31e032130409
succeeding MDSW: c91b981d-a453-458b-b69f-290962130403
Updated•12 years ago
|
Assignee: lars → rhelmer
Component: Backend → General
Updated•12 years ago
|
Severity: normal → critical
Assignee | ||
Comment 9•11 years ago
|
||
I don't recall what action we decided on here, but I see this is assigned to me so I am going to make the correlation scripts ignore a blank crash reason.
Status: NEW → ASSIGNED
Assignee | ||
Comment 10•11 years ago
|
||
Testing a fix with 20130510 now.
Assignee | ||
Comment 11•11 years ago
|
||
https://crash-analysis.mozilla.com/crash_analysis/20130510/ looks better, but a little odd:
* I notice this at the top of the reports, where OS should go (which is new) in e.g. https://crash-analysis.mozilla.com/crash_analysis/20130510/20130510_Firefox_21.0-interesting-addons-with-versions.txt.gz:
None
EMPTY: no crashing thread identified; corrupt dump (5796 crashes)
* The .txt version seems to be 0-byte for https://crash-analysis.mozilla.com/crash_analysis/20130510/20130510_Firefox_21.0-interesting-addons-with-versions.txt but the .gz is fine
The second point is really odd, I am not sure why we produce both of these anyway. I think the first point is an artifact of the client-side bug suspected of causing this.
Ted, do you happen to know the bug# of that client-side bug? ^
Flags: needinfo?(ted)
Assignee | ||
Comment 12•11 years ago
|
||
OK I have removed existing 0-byte files and am backfilling for 2013 April and May (backwards from the May 9th).
I will work on getting this patch into the upstream crash data tools and deployed.
Comment 13•11 years ago
|
||
(In reply to Robert Helmer [:rhelmer] from comment #11)
> None
> EMPTY: no crashing thread identified; corrupt dump (5796 crashes)
That's expected, most of those have empty dumps and the information on OS is actually in the dump, so we don't know an OS there right now (Ted has filed a bug against himself to help improve that).
Comment 14•11 years ago
|
||
(In reply to Robert Helmer [:rhelmer] from comment #11)
> * I notice this at the top of the reports, where OS should go (which is new)
> in e.g.
> https://crash-analysis.mozilla.com/crash_analysis/20130510/
> 20130510_Firefox_21.0-interesting-addons-with-versions.txt.gz:
> None
> EMPTY: no crashing thread identified; corrupt dump (5796 crashes)
There were no crash correlations for that signature (with no crash reason) previously.
Comment 15•11 years ago
|
||
Backfilled files give a 403 Forbidden error.
Assignee | ||
Comment 16•11 years ago
|
||
(In reply to Scoobidiver from comment #15)
> Backfilled files give a 403 Forbidden error.
I've corrected this for the files done so far.
Comment 18•11 years ago
|
||
(In reply to Robert Helmer [:rhelmer] from comment #12)
> I will work on getting this patch into the upstream crash data tools and
> deployed.
Do you have a deadline for that because the release of a new version demands up-to-date crash correlations?
Assignee | ||
Comment 19•11 years ago
|
||
(In reply to Scoobidiver from comment #18)
> (In reply to Robert Helmer [:rhelmer] from comment #12)
> > I will work on getting this patch into the upstream crash data tools and
> > deployed.
> Do you have a deadline for that because the release of a new version demands
> up-to-date crash correlations?
The previous backfill completed and looks ok, I'll put up the patch shortly.
Just kicked off 201305{15..11} right now too.
Assignee | ||
Comment 20•11 years ago
|
||
Bug 870165 is causing crash reason to be empty for some crashes.
Attachment #749942 -
Flags: review?(dbaron)
Comment 21•11 years ago
|
||
Comment on attachment 749942 [details] [diff] [review]
ignore empty crash reason
Seems reasonable, though please use diff -u.
(Not sure what the added "import sys" is for, though it shouldn't do any harm either.)
Attachment #749942 -
Flags: review?(dbaron) → review+
Assignee | ||
Comment 22•11 years ago
|
||
Bug 870165 is causing crash reason to be empty for some crashes (same as previous patch but remove debugging and make it apply cleanly to hg checkout)
Attachment #749942 -
Attachment is obsolete: true
Attachment #749944 -
Flags: review?(dbaron)
Assignee | ||
Comment 23•11 years ago
|
||
Attachment #749944 -
Attachment is obsolete: true
Attachment #749944 -
Flags: review?(dbaron)
Assignee | ||
Comment 24•11 years ago
|
||
(In reply to Robert Helmer [:rhelmer] from comment #23)
> Created attachment 749951 [details] [diff] [review]
> ignore empty crash reason (as landed)
Landed as http://hg.mozilla.org/users/dbaron_mozilla.com/crash-data-tools/rev/0d9be01ab7ce (accidentally pushed a commit for bug 788055 originally, so backed it out and pushed this instead)
Comment 25•11 years ago
|
||
There's still the forbidden permission for recently backfilled files.
(In reply to Robert Helmer [:rhelmer] from comment #24)
> Landed as
> http://hg.mozilla.org/users/dbaron_mozilla.com/crash-data-tools/rev/
> 0d9be01ab7ce (accidentally pushed a commit for bug 788055 originally, so
> backed it out and pushed this instead)
It is in prod or should we wait Socorro 47?
Assignee | ||
Comment 26•11 years ago
|
||
(In reply to Scoobidiver from comment #25)
> There's still the forbidden permission for recently backfilled files.
Fixed this and continued backfilling. I think this is a bug in the cron job, it's depending on a side-effect of another job to set this correctly - I'll file a separate bug for this.
> (In reply to Robert Helmer [:rhelmer] from comment #24)
> > Landed as
> > http://hg.mozilla.org/users/dbaron_mozilla.com/crash-data-tools/rev/
> > 0d9be01ab7ce (accidentally pushed a commit for bug 788055 originally, so
> > backed it out and pushed this instead)
> It is in prod or should we wait Socorro 47?
I filed bug 874113 to deploy this, it doesn't need to wait for a Socorro release.
Depends on: 874113
Comment 27•11 years ago
|
||
Forbidden permission on May 23 and 25.
Bug 870029 hasn't been taken into account.
Assignee | ||
Comment 28•11 years ago
|
||
(In reply to Scoobidiver from comment #27)
> Forbidden permission on May 23 and 25.
Fixed this, I'll make it part of the backfill script if this needs to be done again.
> Bug 870029 hasn't been taken into account.
I had thought bug 874113 would be closed by now :/ I will poke on that one.
Comment 29•11 years ago
|
||
(In reply to Robert Helmer [:rhelmer] from comment #28)
> (In reply to Scoobidiver from comment #27)
> > Bug 870029 hasn't been taken into account.
> I had thought bug 874113 would be closed by now :/ I will poke on that one.
Bug 870029 was dependent on a Socorro release while bug 874113 isn't so it's odd they are related.
Assignee | ||
Comment 30•11 years ago
|
||
(In reply to Scoobidiver from comment #29)
> (In reply to Robert Helmer [:rhelmer] from comment #28)
> > (In reply to Scoobidiver from comment #27)
> > > Bug 870029 hasn't been taken into account.
> > I had thought bug 874113 would be closed by now :/ I will poke on that one.
> Bug 870029 was dependent on a Socorro release while bug 874113 isn't so it's
> odd they are related.
The script I am using to backfill is outside of Socorro, which is why bug 870029 was not picked up automatically. Bug 874113 should fix the real underlying problem here which is why I lamented it :)
Assignee | ||
Comment 31•11 years ago
|
||
I have moved the overrides from bug 870029 into the backfill script (sorry for missing that before!) and have also put in a fix for the "permission denied" problem. Backfill is running for 2013-05-{25..15} now.
BTW, one of the reasons I broke this backfill into a separate script is so that it does not interfere with normal runs - so when bug 874113 I expect this to be fixed but it will not step on the normal nightly run (or vice versa). Missing things like manual overrides is an unfortunate side-effect though.
Assignee | ||
Comment 32•11 years ago
|
||
(In reply to Robert Helmer [:rhelmer] from comment #31)
> I have moved the overrides from bug 870029 into the backfill script (sorry
> for missing that before!) and have also put in a fix for the "permission
> denied" problem. Backfill is running for 2013-05-{25..15} now.
Oops that should read {28..15}
Assignee | ||
Comment 33•11 years ago
|
||
Also for anyone interested, we are tracking the replacement for this version of the correlations system in bug 875990. The current system has many moving parts that aren't really necessary anymore, and (obviously) has not been a well-maintained as we would like.
Assignee | ||
Comment 34•11 years ago
|
||
This should be fixed now (!)
I will continue to monitor, please do let me know if I've missed anything.
Status: ASSIGNED → RESOLVED
Closed: 11 years ago
Resolution: --- → FIXED
You need to log in
before you can comment on or make changes to this bug.
Description
•