Closed Bug 573100 Opened 14 years ago Closed 11 years ago

minidump_stackwalk replacement should produce JSON output instead of pipe-delimited text

Categories

(Socorro :: General, task)

task
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: ted, Assigned: ted)

References

(Blocks 1 open bug)

Details

Attachments

(2 files, 4 obsolete files)

With the new Hadoop-based processor, we won't be using minidump_stackwalk, but a custom binary. Because of this we have the opportunity to improve its output format. Lars tells me that he's currently writing some code to convert the existing pipe-delimited output to JSON anyway, to make future map/reduce jobs easier. The obvious next step here would be to make the C++ tool produce JSON output directly. I had thought about this before, and my thoughts were something along the lines of: {'status': <integer>, // success or failure code, probably from http://mxr.mozilla.org/mozilla-central/source/toolkit/crashreporter/google-breakpad/src/google_breakpad/processor/minidump_processor.h#47 'system_info': {'OS': 'Windows NT', 'OS version': '6.0.6002 Service Pack 2 ', 'CPU': 'x86', 'CPU info': 'AuthenticAMD family 15 model 67 stepping 3', 'CPU Count': 2} 'crash_info': {'type': 'EXCEPTION_BREAKPOINT', 'crash address': '0x123456', 'crashing thread': 1 // possibly null } 'modules': [{'filename': 'foo.dll', 'version': '1.2.3', 'debug file': 'foo.pdb', 'debug identifier': 'FFFFFFFF', 'base address': '0x100000', 'end address': '0x101000', 'main module': false // true for exe }, ... ], 'threads': [[{'module': 'foo.dll', 'function': 'DoSomething(int foo)', 'file': 'hg:hg.mozilla.org/releases/mozilla-1.9.2:foo/foo.cc:0383745fc4de', 'line': 123, 'offset': '0xc' }, ... more frames ], ... more threads ] }
Blocks: 602209
Target Milestone: --- → 1.7.8
Blocks: 631806
That is nearly identical to what I did in 1.8
After some discussion, here's a revised strawman: http://etherpad.mozilla.com:9000/SocorroStackwalkJSON
No longer blocks: 631806
Blocks: 607831
Attached file Sample JSON output (obsolete) (deleted) —
Here's some sample output. It should match the spec on the EtherPad.
I'm very sorry, something didn't translate properly in the etherpad. There was still an array for the frames, and that would make it impossible to search for crashes where the second frame had the signature X. I updated the etherpad with a corrected sample that contains only index keyed dicts for threads and frames. Take a look and tell me what you think.
Attached file Sample JSON output (obsolete) (deleted) —
Looks sane, easy enough change, here's new sample output that should match the revised spec.
Attachment #514839 - Attachment is obsolete: true
I've pushed the code to my minidump-stackwalk repo on the 'json' branch: http://hg.mozilla.org/users/tmielczarek_mozilla.com/minidump-stackwalk/ You can hg clone that repository, then "hg up json" to get the right code. I also built a copy on khan, it's in ~tmielczarek/minidump-stackwalk/.
That sample looks pretty usable to me. I think maybe we could take a bunch of dump files and run them through this test build so we can feed them into elasticsearch for further testing. Anurag, could you mull that over and come up with some suggestions or a plan?
It'll be nice if we can have ~2K JSON's to test. I'll setup a test ES index that can be used for this purpose.. daniel/ted - what's the easiest way to transform the dumps to JSONs? Will I need to install mdsw in my sandbox?
Yeah, I think probably easiest would be to put this mdsw on a server somewhere, grab a bunch of dump files from production, and feed them through msdw and into ES.
ted - can i install this on my sandbox without needing external libraries? or is there an existing machine that I can use to get the jsons?
As I mentioned in comment 6, I have a prebuilt copy on khan at ~tmielczarek/minidump-stackwalk/stackwalker. You pass it a minidump file as the first argument on the command line and symbol paths as additional args (so you can pass /mnt/socorro/symbols/symbols_* there on khan). It also accepts a -p argument to make it pretty-print the JSON (like in the sample here), the default will not contain extraneous spaces or newlines. Sample command line I just ran: ~tmielczarek/minidump-stackwalk/stackwalker /home/minidumps/f110209f01fdbc4-b010-491c-9e92-f952a2110209.dump /mnt/socorro/symbols/symbols_* 2>/dev/null > output.json
Blocks: 636868
Blocks: 638204
Attached file schema-ish description (obsolete) (deleted) —
Here's a schema-ish description of the output.
Target Milestone: 1.7.8 → 2.0
Actually this bug is already kind of fixed by ES. Search in ES is term-based, so every field is analyzed and separated into terms, and the pipe is considered as a delimiter. That means we can search for something in the dump field without changing anything. Having a more structured schema for the dump would however be useful to create more accurate queries.
Not exactly, because the dump has different sets of data, like "the crashing thread", that's probably not easy to expose, but very useful. (Also, fixing this gives us other useful things aside from searchability, like the ability to easily expose additional fields in the output.)
Doesn't have to land for 2.0, but *must* land in 2.1. This blocks all the stack-frame-search type bugs.
Target Milestone: 2.0 → 2.1
Target Milestone: 2.1 → ---
Attachment #514858 - Attachment is obsolete: true
Attached file Sample JSON output (revised) (obsolete) (deleted) —
After discussion with deinspanjer, new versions of ElasticSearch support a new search feature that will make using arrays possible instead of the uglier format we have before. Here's sample output, I pushed the changes to my minidump-stackwalk repo: http://hg.mozilla.org/users/tmielczarek_mozilla.com/minidump-stackwalk/rev/eb79df82fb02
Attached file schema-ish description (revised) (deleted) —
Attachment #533643 - Attachment is obsolete: true
As previously, I have compiled this code on khan, the binary is at: /home/tmielczarek/minidump-stackwalk/stackwalker
Component: Socorro → General
Product: Webtools → Socorro
I added the exploitability rating to the output: http://hg.mozilla.org/users/tmielczarek_mozilla.com/minidump-stackwalk/rev/f132d1ca9121 We decided that all sensitive data would live inside a "sensitive" key so that it can be elided when we store the JSON, so that we can display it publicly. Thus the output will have: { ... "sensitive": {"exploitability: "..." } ... }
Attached file Sample JSON output (v3) (deleted) —
I never uploaded a revised version with the "sensitive" key as described above. Also, while reviewing lars' pipe dump->JSON converter elsewhere I realized I had a bug in the JSON output--it didn't include the "crash_info" key. I fixed that, and this sample output includes it.
Attachment #549156 - Attachment is obsolete: true
Depends on: 894458
Blocks: 894483
Depends on: 906131
Blocks: 431514, 767067
This has all been merged into the Socorro tree: https://github.com/mozilla/socorro/tree/master/minidump-stackwalk
Status: NEW → RESOLVED
Closed: 11 years ago
Resolution: --- → FIXED
Blocks: 939141
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: