Closed Bug 514593 Opened 15 years ago Closed 15 years ago

SkipList Split up [@ js_Interpret] crash signatures

Categories

(Socorro :: General, task)

task
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: jruderman, Assigned: lars)

Details

(Whiteboard: [crashkill][crashkill-metrics][1.9.2-fixed])

js_Interpret is a gigantic function. As a single signature, js_Interpret is the #24 topcrash for Firefox 3.5.2, but this isn't really useful for getting bugs fixed. I can think of two ways to "split it up" server-side, using the line number information that already exists: * Make the line number be part of the signature, e.g. js_Interpret:4546. * Look at the source code to find the most recent BEGIN_CASE line. For example, the section after BEGIN_CASE(JSOP_CALLPROP) might be given the fake signature js_Interpret_JSOP_CALLPROP. The latter would be more stable against changes to the code, but would require the server to have a copy of the relevant version of jsinterp.cpp.
Summary: Split up js_Interpret crash signatures → Split up [@ js_Interpret] crash signatures
The latter sounds really complicated. The former sounds more workable, but is going to really segregate out your crashes. I suppose that might be ok, since crashes on different lines are unlikely to have the same root cause.
Bug 510505 might help here, if we could figure out an inexpensive way to get the current opcode into a skidmark. Alternatively, we could maybe have BEGIN_CASE do some symbol trickery to give socorro something to chew on?
(In reply to comment #2) > Alternatively, we could maybe have BEGIN_CASE do some symbol trickery to give > socorro something to chew on? I was thinking about this, but I'm unaware of any way to convince the compiler to output a different method name for certain blocks of code, aside from actually splitting them into separate methods and forcing them to be inlined. I also thought about trying to post-process the debug symbols, but that started to make my head hurt.
How does socorro deal with inlines? Those are both within the same function, nested, and it seems like we could do something similar here with a synthetic _begin_socorro_region_JSOP_FOO/_end_socorro_region_JSOP_FOO symbol pair. Not sure how to get that done on Windows, but I think there are some directives in gcc that would help us. Alternatively (heh) we could use the valgrind trick of special nop or jump-over-nop sequences to put sentinels in the code that we scan with the socorro tools. Or we could figure out how to reuse the address-of-label stuff we use for token threading, but that only works on GCC too.
The Breakpad symbol dumping tools are generally inline-aware, although Linux stabs may still have some bugs. If you can point out a specific method that gets inlined (and a method calling it), I could show you what it looks like in the symbol files. The Breakpad symbols are just a dump of what the compiler's debug info tells us, so modulo post-processing (which would be a bit painful), we're limited to what the compiler produces. The Breakpad symbol format basically looks like: FUNC address size name lineaddress size line file The line data can specify a new file at any point inside the function. Post-processing to break it up would require recalculating the function size, and adding synthetic functions with the appropriate address+size. Also, jimb wrote up some nice docs on the symbol file format if you care to know more: http://code.google.com/p/google-breakpad/wiki/SymbolFiles
Bug 507723 changed where js_Interpret code lives. Now much of it is in jsops.cpp. See e.g. reed's crash: http://crash-stats.mozilla.com/report/index/3e9f5932-2d9b-4fd5-aaad-8de102090907
(In reply to comment #5) > The Breakpad symbol dumping tools are generally inline-aware, although Linux > stabs may still have some bugs. The Linux STABS dumper doesn't deal with inlined functions at all. The new DWARF-on-Linux reader doesn't deal with them yet, but it's on the agenda.
I think the line number idea is good enough for a start. If we find that it is splitting things up too much, we can deal with that later.
No longer blocks: 519363
Bug 519363 comment 1 lists the top five crash points in js_Interpret for 1.9.1.
Whiteboard: [crashkill][crashkill-metrics]
We really need to get on this; @js_Interpret is the Firefox 3.6 Beta 1 topcrash by an order of magnitude.
(In reply to comment #10) > We really need to get on this; @js_Interpret is the Firefox 3.6 Beta 1 topcrash > by an order of magnitude. I looked at a sample of the "last" 500 of these in FF3.6b1 just now. I found that 95% were covered by bug 525028 and bug 519363. Both are already marked blocking 1.9.2 so we just need to get them landed.
If we want to make sure that we're not shipping new js_interp regressions, we need this before we ship. Marking this as blocking 1.9.2+ so we can argue about it. From what I'm hearing from dmandelin, we need this to know.
Flags: blocking1.9.2+
Summary: Split up [@ js_Interpret] crash signatures → SkipList Split up [@ js_Interpret] crash signatures
I forgot to mention earlier that another reason this is important is that the core-count/dll/add-on distribution analyses don't produce useful data if all js_Interpret crashes are grouped together, but might tell us some good stuff once they are split.
Another note: the "SkipList" in the title suggests that this is about adding js_Interpret to the SkipList, but that is not what we want, and won't help. Rather, we want the line number of the crash point added to the signature.
For example, in https://crash-stats.mozilla.com/report/index/12914933-360e-4d39-a05a-b7d422091123 the signature would be: "js_Interpret:4436" and in https://crash-stats.mozilla.com/report/index/47b10c2a-75ec-4192-99c2-3c6692091123 it would be "js_Interpret:3046" confirm this for me and I'll get on it.
(In reply to comment #15) > For example, in > https://crash-stats.mozilla.com/report/index/12914933-360e-4d39-a05a-b7d422091123 > the signature would be: "js_Interpret:4436" and in > https://crash-stats.mozilla.com/report/index/47b10c2a-75ec-4192-99c2-3c6692091123 > it would be "js_Interpret:3046" > > confirm this for me and I'll get on it. That looks great.
ok, I've got this implemented and working on my dev system. I'm going to get it pushed on to staging tomorrow, but there is going to have to be some fancy dancing to do it. There is another change working its way through staging/production push, but it is stalled because of an yet unresolved problem. It is likely we're going have to back out that change to push this change through. I will keep this thread updated on progress...
Target Milestone: --- → 1.2
Assignee: nobody → lars
Does this add line numbers to all signatures? Or just to js_Interpret? I'm somewhat worried that adding it to all functions is going to make analysis across versions harder. It would be great if we could just have a whitelist of functions to which the line number was included. And let that whitelist just consist of js_Interpret for now. Though this is all IMHO, others might disagree.
This should just be to js_Interpret.
The processor uses a configurable parameter called 'signaturesWithLineNumbersRegEx' to identify signatures that must be combined with their associated line numbers. Currently, the default is just 'js_Interpret'. If you have others to add, it is just a configuration change. This feature is on track to be added to the production instance of Socorro in our first coordinated release Thursday night. We're sorry this has taken so long to get into production. As our group has increased in size and scope, we've had to move away from the cowboy software production model to a more coordinated system. This new structured release schedule will keep our group from stepping on our own proverbial toes.
this is complete and in production. For an example, see this one: http://crash-stats.mozilla.com/report/index/5a4d380b-7951-4191-86ca-6e0662091203 the new signatures should start showing up in trend reports in the next few hours (it's 2009-12-03 22:30pst now).
Status: NEW → RESOLVED
Closed: 15 years ago
Resolution: --- → FIXED
Whiteboard: [crashkill][crashkill-metrics] → [crashkill][crashkill-metrics][1.9.2-fixed]
Component: Socorro → General
Product: Webtools → Socorro
You need to log in before you can comment on or make changes to this bug.