Closed Bug 802106 Opened 12 years ago Closed 12 years ago

Breakpad stackwalking optimization: optimize PostfixEvaluator

Categories

(Toolkit :: Crash Reporting, defect)

defect
Not set
normal

Tracking

()

RESOLVED FIXED
mozilla21

People

(Reporter: ted, Assigned: jseward)

References

Details

Attachments

(4 files, 2 obsolete files)

Julian did a callgrind profile of his SPS unwinder that uses Breakpad for unwinding, and it shows that we spend a huge portion of time in the PostfixEvaluator class: http://code.google.com/p/google-breakpad/source/browse/trunk/src/processor/postfix_evaluator.h http://code.google.com/p/google-breakpad/source/browse/trunk/src/processor/postfix_evaluator-inl.h Specifically, the unwind rules in Breakpad symbol files are specified as postfix expressions translated from DWARF CFI, and every time we walk the stack from a callee frame to its caller we build a PostfixEvaluator, feed it the current register state, and evaluate one of these expressions to recover the caller register values. See: http://code.google.com/p/google-breakpad/source/browse/trunk/src/processor/stackwalker_amd64.cc#122 http://code.google.com/p/google-breakpad/source/browse/trunk/src/processor/cfi_frame_info.cc#46 Most of the damage seems to be fallout from heavy use of the STL: lots of memcmp from comparing string keys and an awful lot of mallocing. I suspect we could get some pretty large wins if we could just switch from string keys to int keys by maintaining a string table.
Attached patch Speed up the prefix evaluator, first attempt (obsolete) (deleted) — Splinter Review
This replaces the use of std::string keys in the dictionaries used for evaluation, with a new type UniqueString*, basically an interned string for which equality can be done via pointer equality. It also changes the type of the evaluator stack element from std::string to a new type StackElem, which is a tagged union of ValueType and UniqueString*. This makes the evaluator a bit clearer and avoids storing literals (numbers) in the stack as text; the latter clearly is inefficient in that it causes arithmetic on literals to require conversion of operands to/from strings. This helps, although it doesn't give a huge speedup. In the SPS integration it reduces the cost of StackwalkerAMD64::GetCallerFrame from about 150k instructions down to 100k, better but still insanely expensive. One background reason is there is still a lot of std::string activity present because the expression strings are tokenised and parsed using std::string.
Assignee: nobody → jseward
Give "expressions" (the things evaluated by the postfix evaluator) their own type (Module::Type) rather than merely being strings. Allow the type to have special non-string representation for the special cases "identifier + immediate" and "*(identifier + immediate)". These special cases can represent all the Dwarf Cfi expressions we generate, and so allow Cfi unwinding to happen without endlessly having to parse strings. Also enhance the postfix evaluator to handle these special cases. This speeds up StackwalkerAMD64::GetCallerFrame by very roughly a factor of (150k insns/frame before, 46k/frame after).
This looks great! After reading your patch I can offer some more concrete suggestions. My original idea was that we'd be able to avoid making changes to PostfixEvaluator at all, which should make the patch a bit conceptually simpler. I thought that perhaps in CFIFrameInfo::FindCallerRegs: http://code.google.com/p/google-breakpad/source/browse/trunk/src/processor/cfi_frame_info.cc#73 we could somehow lazily init the PostfixEvaluator and only use it if the unwind rule required it.
Rehash of the UniqueString patch, to be applied on top of the patch in comment 2. This patch replaces the used of std::string keys with a new type, UniqueString*, which only ever has one copy of any string and hence allows string equality to be done by pointer equality. In combination with Ted's direct-from-the-dwarf-sections patches and the Cfi rule re-representation patch in comment 2, this allows CFI unwinding to happen with essentially no std::string allocation or comparison. According to my rough measurements, this reduces the cost of calls to StackwalkerAMD64::GetCallerFrame from about 46kI/frame to about 26kI/frame. This is not as aggressive as changing of keys directly to integers, but it does sidestep the problem of various evaluators (particularly stackwalker_x86.cc) stuffing entirely arbitrary string keys into the evaluation dictionaries and expecting everything still to work. This patch contains some "backwards compatibility" impedance matching, so that uses of the plain postfix evaluator with plain strings, still works right (although untested). It also contains a cleanup change to postfix_evaluator-inl.h which should possibly be factored out into its own patch: it changes the type of the stack elements from strings to a union of string (actually, of course, UniqueString*) and ValueType. This makes the code clearer and avoids having to convert literal values back and forth to strings during evaluation.
Attachment #672755 - Attachment is obsolete: true
One final speedup patch for the unwinder, that applies on top of the patch in comment 4. This changes the type of the dictionaries used in evaluation from instantiations of std::map to a new type, UniqueStringMap<ValueType>. This has two advantages: * it allows precise measurement/profiling of dictionary usage stats * it allows using a optimised map implementations UniqueStringMap optimises the small-map case. From some testing it appears that almost all maps (dictionaries) have 10 or fewer elements. UniqueStringMap therefore stores (key,value) pairs in a fixed array of 10 elements and performs linear search for access. If this array fills up (very rarely), then it allocates and uses a std::map to handle the overflow. This easily doubles the performance of StackwalkerAMD64::GetCallerFrame, reducing the cost from about 26kI/frame to 11kI/frame. Profiling shows it removes almost all the std::map activity during CFI unwinding. Overall cost when testing SPS on x86_64-linux, for max-32 element stack unwinds, shows a cost of about 670K insns/unwind. SPS can sample at 1KHz on this machine (Core i5) at less than 40% cpu utilisation now.
Attached patch Fix assignment operator in UniqueStringMap (obsolete) (deleted) — Splinter Review
This applies on top of the patch in comment 5, and fixes a problem with it. UniqueStringMap contains a raw pointer, and the default assignment operator for it does not handle that correctly, leading to read/write after free and double frees, on arm-android. If we had used the default copy constructor the same problem would exist. This patch fixes both cases.
Attached patch Misc fixes for arm-android (deleted) — Splinter Review
Misc fixes needed to build/use native unwind on arm-android: * don't call abi::__cxa_demangle since it doesn't exist * ~MmapWrapper(): don't munmap unless base_/size_ seem sane Applies on top of the patch in comment 6.
Comment on attachment 674594 [details] [diff] [review] Change representation of Cfi rules internally I fixed a few style issues that I knew would get dinged in review, and also fixed some of the Breakpad unit tests, and got this patch up for review: https://breakpad.appspot.com/516004/
I rolled up all the remaining patches here along with the Breakpad bits of the patch from bug 812133 and pushed them to Try: https://tbpl.mozilla.org/?tree=Try&rev=717a458f3518 It all built fine on my local Linux machine.
Comment on attachment 675297 [details] [diff] [review] Change std::string keys to UniqueString*, second try After working out a few small issues with Julian, and cleaning up some style nits, this patch is up for upstream review here: https://breakpad.appspot.com/519002/
Comment on attachment 675301 [details] [diff] [review] Replace eval-dictionary mappings with a specialised type, UniqueStringMap This patch (with the "fix assignment operator" patch rolled into it) is up for upstream review: https://breakpad.appspot.com/520002/
Attachment #678232 - Attachment is obsolete: true
Comment on attachment 678240 [details] [diff] [review] Misc fixes for arm-android Uploaded this patch for upstream review: https://breakpad.appspot.com/521002 I didn't really understand the purpose of the dump_symbols MmapWrapper change, so I left it out. If it turns out to be important we can put it back, but it feels hacky.
Depends on: 838882
Status: NEW → RESOLVED
Closed: 12 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla21
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: