Closed Bug 938157 Opened 11 years ago Closed 11 years ago

Lightweight CFI/EXIDX unwinding library for SPS

Categories

(Core :: Gecko Profiler, defect)

x86_64
Linux
defect
Not set
normal

Tracking

()

RESOLVED FIXED
mozilla31

People

(Reporter: jseward, Assigned: jseward)

References

Details

Attachments

(2 files, 17 obsolete files)

(deleted), patch
BenWa
: review+
Details | Diff | Splinter Review
(deleted), text/plain
Details
Rather than continue to try to improve the performance of the existing Breakpad-based native unwinder, it seems more promising to reimplement the core CFI unwind loop so as to avoid all table-based access, whilst reusing the Breakpad ELF and Dwarf CFI parsers, as they are known to work and do not comprise a performance bottleneck.
Attached file tar-NewLib-2013nov13.tar-check5 (obsolete) (deleted) —
Standalone demonstrator/development tarball. Reads its own CFI and does a test unwind. Works on x86_64-linux, x86-linux and arm-linux. To use: untar, then: # edit BUILD.sh to select compiler + flags sh ./BUILD.sh ./Obj/hackymain
Attached file tar-NewLib-2013nov14.tar-check6 (obsolete) (deleted) —
With minor restructuring/cleanups, suitable for initial hook-up-to-SPS testing.
Attachment #831536 - Attachment is obsolete: true
Nice. I had a very quick skim, and my main observations are about integration-type stuff. - Is the fact that it's a separate library fundamental? Was it just easier to test that way? Just curious. - It uses the C++ standard library, e.g. std::{map,string,vector,stack}. We generally avoid that in Mozilla code because we don't use C++ exceptions, though it's not a restriction that's universally applied. So... hmm. - How much heap memory does it allocate? If it's significant -- i.e. enough that we'd want to measure and show it in about:memory -- it would be nice to have a way to register a custom allocator, as per https://blog.mozilla.org/nnethercote/2013/11/08/libraries-should-permit-custom-allocators/.
(In reply to Nicholas Nethercote [:njn] from comment #3) > - Is the fact that it's a separate library fundamental? > Was it just easier to test that way? Just curious. Not fundamental. Just easier to test/develop. > - It uses the C++ standard library, e.g. std::{map,string,vector,stack}. We > generally avoid that in Mozilla code because we don't use C++ exceptions, > though it's not a restriction that's universally applied. > So... hmm. If that's the case, then we're also using the STL at least because it's dragged in by the Breakpad bits linked into libxul as a result of using Breakpad to do SPS unwinding. But in any case, perhaps it'd be possible to replace those with our own classes. > - How much heap memory does it allocate? If it's significant -- i.e. enough > that we'd want to measure and show it in about:memory -- it would be nice to > have a way to register a custom allocator, My estimate for the initial implementation is several tens of megabytes (50 ish ?) extra residency. Per https://blog.mozilla.org/jseward/2013/09/03/how-compactly-can-cfiexidx-stack-unwinding-info-be-represented/ that can be pushed down a lot further, by perhaps a factor of 5. But I haven't done that so far since the emphasis so far has been on getting it working at all. That notwithstanding, having it show up in about:memory is a good plan.
Attached patch bug938157-newlib-1.diff (obsolete) (deleted) — Splinter Review
Patch that adds the library to the tree, and makes it compilable. Is not actually hooked up to SPS at this point
Attachment #832127 - Attachment is obsolete: true
W.r.t. STL usage, https://developer.mozilla.org/en-US/docs/Using_CXX_in_Mozilla_code#C.2B.2B_and_Mozilla_standard_libraries documents things. Some classes are considered ok to use, but "as a general rule of thumb, prefer the use of MFBT or XPCOM APIs to standard C++ APIs". In particular, whichever structures are large enough to warrant a memory reporter should use Mozilla types because they have SizeOf* functions.
Attached patch Builds and works for x86_64-linux (obsolete) (deleted) — Splinter Review
Initial wire-up for x86_64-linux. Produces plausible looking profiles. Most critical current problem is the need to add stack scan functionality so it can in some cases step out of leaf library frames for which there is no unwind info (eg, libc).
Attachment #8335247 - Attachment is obsolete: true
Adds stack-scan functionality and reinstates stats lines. Initial callgrinding shows costs coming in around 310 insns per CFI-recovered frame on x86_64-linux.
Attachment #8339964 - Attachment is obsolete: true
Attached patch Fixes stack-scan perf problems (adds SegArray) (obsolete) (deleted) — Splinter Review
Now with a perf-robust implementation of stack scanning.
Attachment #8340810 - Attachment is obsolete: true
Still needs EXIDX reading to be of any use on arm-android, though.
Attached patch bug938157-newlib-6.cset (obsolete) (deleted) — Splinter Review
Adds EXIDX support for arm-android. Also better debug logging on all targets.
Attachment #8341089 - Attachment is obsolete: true
Attachment #8343700 - Attachment is obsolete: true
Attached patch bug938157-newlib-7.cset (obsolete) (deleted) — Splinter Review
Adds stack scanning support for ARM. In summary, this patch provides the new library and integration with SPS, with the following functionality: x86_64-linux: CFI based unwinding and stack scanning arm-android: CFI and EXIDX based unwinding, stack scanning, hookup with faulty.lib (so profiling can work direct out of .apks, with no need to unpack)
Attachment #8349403 - Attachment is obsolete: true
Attached patch bug938157-newlib-8.cset (obsolete) (deleted) — Splinter Review
Adds CFI and stack-scan support for x86-linux (32 bit).
Attachment #8350232 - Attachment is obsolete: true
Attached patch bug938157-newlib-9.cset (obsolete) (deleted) — Splinter Review
* fix segfault when starting with MOZ_PROFILER_STARTUP set * fix biasing on all targets -- greatly improves stacks on desktop linux * handle __kernel_vsyscall on x86-linux * allow objects with .exidx but no .extab * fix a small memory leak
Attachment #8360419 - Attachment is obsolete: true
Attached patch bug938157-newlib-11b.cset (obsolete) (deleted) — Splinter Review
First version that does not cause build failures on try.
Attachment #8363548 - Attachment is obsolete: true
(In reply to Julian Seward from comment #15) > Created attachment 8366741 [details] [diff] [review] > bug938157-newlib-11b.cset > > First version that does not cause build failures on try. Yay. Also, a name suggestion only slightly less silly than the current one: "Five O'Clock". Why? "Because it's time to unwind"
(In reply to Nathan Froyd (:froydnj) from comment #16) > Also, a name suggestion only slightly less silly than the current one: "Five > O'Clock". Why? "Because it's time to unwind" LOL
Attached patch bug938157-newlib-12.cset (obsolete) (deleted) — Splinter Review
* fixes unwind problems on ARM. Now unwinds robustly through system libraries and our code on arm-android, x86-linux, x86_64-linux. * copies in a version of jchen's fix to 959931 (Unwinder loads libmozglue.so a second time) since this patch will otherwise delete jchen's fix, I think. This is the first version that generates robust, usable unwinds on arm-android. Appears to assert in debug builds. Investigating. If you want to try it, do a non-debug build.
Attachment #8366741 - Attachment is obsolete: true
Attached patch bug938157-newlib-12a.cset (obsolete) (deleted) — Splinter Review
Minor pre-review cleanups. Here's a summary of what's in the patch: newlib_common.cpp newlib_common_ext.h newlib_common_int.h Supporting infrastructure for the ELF/Dwarf/EXIDX parsers. Taken from (our copy of) Breakpad. newlib_dwarf.cpp newlib_dwarf_ext.h newlib_dwarf_int.h The Dwarf CFI parser. Taken from Breakpad, but with modifications so as to hand the resulting information to a Summariser (see below). newlib_elf.cpp newlib_elf_ext.h newlib_elf_int.h The ELF parser. Finds the Dwarf and EXIDX sections and hands off to their respective parsers. Taken from Breakpad. newlib_exidx.cpp newlib_exidx_ext.h The EXIDX parser. Taken from Breakpad. Has its own summariser. newlib_main.cpp newlib_main.h Top level control of the library, external interface, and the main unwind loop. New code. newlib_platform_macros.h Defines macros to allow consistent per-platform conditional compilation. newlib_secmap.h Defines some important data structures: NExpr -- a simple expression used to recover a register value RuleSet -- holds the recovery NExpr's for all registers in a given (code) address range SecMap -- holds all the RuleSets for a shared object, sorted to allow fast lookup newlib_summariser.cpp newlib_summariser.h The Summariser holds a RuleSet whilst it is being constructed, and receives information incrementally from the Dwarf parser. That information is added to the RuleSet if possible. When done, if the RuleSet contains enough useful information, it is added to the SecMap for the shared object to make it available for unwinding. UnwinderThread2.cpp: Not a new file. Changes here - remove the interface to Breakpad - add an interface to newlib - changes it so information about all objects is read at startup of the unwinder thread, rather than incrementally on-demand as was the case when using Breakpad.
Attachment #8371620 - Attachment is obsolete: true
Attachment #8372965 - Flags: review?(n.nethercote)
(In reply to Nathan Froyd (:froydnj) from comment #16) > Also, a name suggestion only slightly less silly than the current one: "Five > O'Clock". Why? "Because it's time to unwind" Tempted as I am to go with your ingenious witticism, it may be the case that it would be a little wiser to choose a name for which we don't have to repeatedly explain the rationale from now until the end of time. I therefore propose the significantly duller but more directly descriptive "Lightweight Unwind Library" (LUL). Would there be any objections to that?
(In reply to Julian Seward from comment #20) > (In reply to Nathan Froyd (:froydnj) from comment #16) > > Also, a name suggestion only slightly less silly than the current one: "Five > > O'Clock". Why? "Because it's time to unwind" > > Tempted as I am to go with your ingenious witticism, it may be the case > that it would be a little wiser to choose a name for which we don't have > to repeatedly explain the rationale from now until the end of time. > I therefore propose the significantly duller but more directly > descriptive "Lightweight Unwind Library" (LUL). Would there be any > objections to that? I have no objections. Googleable, descriptive, slightly amusing acronym, etc...
If only you could cram a Z on the end there. :)
Attachment #8372965 - Flags: review?(nfroyd)
Depends on: 971666
Comment on attachment 8372965 [details] [diff] [review] bug938157-newlib-12a.cset Review of attachment 8372965 [details] [diff] [review]: ----------------------------------------------------------------- > newlib_common.cpp > newlib_common_ext.h > newlib_common_int.h > newlib_elf.cpp > newlib_elf_ext.h > newlib_elf_int.h > newlib_exidx.cpp > newlib_exidx_ext.h So all these files are simply copied? If so, I won't bother reviewing them. How is this going to be tested? Are there existing SPS tests that will suffice? I've reviewed all the non-copied .h files, which is enough for now. It looks pretty good. Most of the comments are about Mozilla style things, or places where your C code could be made into C++ :) W.r.t. style, don't bother changing the code you've obtained from elsewhere, but do change the style for new code. - The style guide is here: https://developer.mozilla.org/en-US/docs/Developer_Guide/Coding_Style - New files need mode lines: https://developer.mozilla.org/en-US/docs/Developer_Guide/Coding_Style#Mode_Line - New files need copyright notices; I guess you'll add them once that other bug has clarified what form they should take. - Filenames like StudlyCaps.cpp names are typical for Moz code. - Various kinds of variable names have prefixes: mFooBar for normal class members, sFooBar for static class members, aFooBar for arguments, and gFooBar for globals. It's a bit weird at first, but I've come to like it, because it's really clear exactly what you're referring to, and it makes bugs due to variable shadowing much less likely. - C++'s |nullptr| should be used rather than |NULL|. I think the rationale is that it's more type-safe, though I don't recall the details. You've defined your own integer typedefs: NUWord and all that. Don't do that; it just makes the code harder to read for anyone who's not used to it. Just use the ones from <stdint.h>: uintptr_t, uint32_t, int32_t, uint16_t, int16_t, and uint8_t. And size_t and off_t. ::: tools/profiler/moz.build @@ +46,5 @@ > 'local_debug_info_symbolizer.cc', > ] > > if CONFIG['OS_TARGET'] in ('Android', 'Linux'): > + # These files cannot be built in unified mode because of name clashes with mozglue headers on Android. What kind of name clashes? ::: tools/profiler/newlib_main.cpp @@ +164,5 @@ > + if (lo > hi) { > + // not found > + return NULL; > + } > + long int mid = (lo + hi) / 2; This computation can overflow. http://googleresearch.blogspot.com.au/2006/06/extra-extra-read-all-about-it-nearly.html explains how it should be done. I bet Valgrind's 19 different copies of binary search have the same bug :) ::: tools/profiler/newlib_main.h @@ +1,5 @@ > +/* -*- Mode: C++; tab-width: 2; indent-tabs-mode: nil; c-basic-offset: 2 -*- */ > +/* This Source Code Form is subject to the terms of the Mozilla Public > + * License, v. 2.0. If a copy of the MPL was not distributed with this > + * file, You can obtain one at http://mozilla.org/MPL/2.0/. */ > + This header lacks a top-level #ifndef guard. @@ +3,5 @@ > + * License, v. 2.0. If a copy of the MPL was not distributed with this > + * file, You can obtain one at http://mozilla.org/MPL/2.0/. */ > + > +#include <pthread.h> // rwlock_t > +#include <unistd.h> // off_t <sys/types.h> defines off_t, AIUI. @@ +14,5 @@ > + > +// A machine word plus validity tag. > +typedef > + struct { NUWord value; bool valid; } > + TaggedNUWord; In C++ you can just do |struct TaggedNUWord { ... }|, no typedef needed. But more OO-ness would be nice, viz: class TaggedWord { TaggedWord() : mValue(0) , mValid(false) {} TaggedWord(uintptr_t tw) : mValue(tw) , mValid(true) {} TaggedWord add(TaggedWord tw1, TaggedWord tw2) { return (tw1.mValid && tw2.mValid) ? TaggedWord(tw1.mValue + tw2.mValue) : TaggedWord(); } bool isAligned() { return mValid && (mValue & (sizeof(uintptr_t) - 1)) == 0; } uintptr_t mValue; bool mValid; }; You don't need to mark them inline, that's implicit in C++ for methods defined inline within the class declaration. You could even overload operator+ instead of add, but that's probably overkill. You could even think about making mValue and mValid private and providing getters/setters. Depends on how the type is used. And are mValue and mValid const? If so, it'd be worth marking them accordingly. Is there a reason why this type isn't within the |newlib| (or |LUL|, or whatever) namespace? @@ +42,5 @@ > + > +#if defined(NEWLIB_ARCH_arm) > +typedef > + struct { > + TaggedNUWord r7; Looks like there's trailing whitespace here, and in various other places. I'm sure you can set up Emacs to highlight it. @@ +60,5 @@ > + } > + UnwindRegs; > +#else > +# error "Unknown plat" > +#endif Again, just |struct UnwindRegs { ... };| works in C++. Also, put the #if inside the struct? @@ +65,5 @@ > + > +} // namespace newlib > + > + > +namespace newlib { Opening and closing the |newlib| namespace like this is odd and not necessary. Removing them will allow you to move the big comments for each class immediately above the class. @@ +68,5 @@ > + > +namespace newlib { > + > +/* CONFIGURABLE */ > +/* The maximum number of bytes in a stack snapshot */ You're using a mixture of C and C++ comments. I'd just use C++ throughout. @@ +69,5 @@ > +namespace newlib { > + > +/* CONFIGURABLE */ > +/* The maximum number of bytes in a stack snapshot */ > +#define NEWLIB_N_STACK_BYTES 32768 Use |static const size_t N_STACK_BYTES|. C++ has true constants, so you don't need to #define constants. And the NEWLIB_ prefix shouldn't be necessary since this is in the |newlib| namespace. @@ +73,5 @@ > +#define NEWLIB_N_STACK_BYTES 32768 > + > +// The stack chunk image that will be unwound. > +typedef > + struct { Again, the typedef isn't needed. @@ +76,5 @@ > +typedef > + struct { > + // [start_avma, +len) specify the address range in the buffer. > + // Obviously we require 0 <= len <= NEWLIB_N_STACK_BYTES. > + NUWord start_avma; What does avma mean? You've used it in several places where I'd probably just use |p|. @@ +88,5 @@ > + > +// The core unwinder library class. Just one of these is needed, and > +// it can be shared by multiple unwinder threads. Access to the > +// library is single-threaded except for Unwind, which can be called > +// by multiple threads simultaneously. I don't understand this second sentence. I think you can just remove it, because the next paragraph explains it more clearly. Other than that, it's a lovely comment. @@ +119,5 @@ > +class SegArray; > +class CFICache; > + > +class NewLib { > + public: Moz style is to not indent |public| keywords at all. @@ +124,5 @@ > + // Create; supply a logging sink. Initialises the rw-lock. > + NewLib(void (*log)(const char*)); > + > + // Destroy. Requires w-lock. By doing that, waits for all unwinder > + // threads to leave ::Unwind calls. All resources are freed and all s/leave/leave in-flight/ ? @@ +128,5 @@ > + // threads to leave ::Unwind calls. All resources are freed and all > + // registered unwinder threads are deregistered. > + ~NewLib(); > + > + // Notify of a new rx- mapping, and load the associated unwind info. s/rx-/r-x/ ? @@ +135,5 @@ > + // itself, so as to be able to read the unwind info. If > + // mapped_image is non-NULL then it is assumed to point to a > + // called-supplied and caller-managed mapped image of the file. > + // > + // Acquires m_rwl in w-mode. This must be called only after the It took me a minute to work out what w-mode and r-mode were. Are they standard terms? Would "Acquires m_rwl as a reader/writer" be clearer? @@ +178,5 @@ > + // > + // Up to scannedFramesAllowed stack-scanned frames may be recovered. > + // > + // The calling thread must previously have registered itself via > + // RegisterUnwinderThread. What happens if it hasn't? What does the return value signify? @@ +187,5 @@ > + UnwindRegs* startRegs, StackImage* stackImg); > + > + private: > + // The one-and-only lock for the library. > + pthread_rwlock_t m_rwl; The |m_| prefix almost follows Moz style :) |mRWLock| would be better, I found "rwl" a little too cryptic. @@ +201,5 @@ > + SegArray* m_segArray; > + > + // The thread-local data; a mapping from threads to CFI-fast-caches. > + // The map requires m_rwl r-locked to read, w-locked to write. The > + // CFICaches themselves are thread local and can be both read and Hyphen after "thread". @@ +208,5 @@ > + std::map<pthread_t, CFICache*> m_caches; > + > + // Invalidate the caches. Requires m_rwl to be held in w-mode; does > + // not acquire it itself. > + void InvalidateCFICaches(); It's common to put methods before data members within public/private sections. @@ +211,5 @@ > + // not acquire it itself. > + void InvalidateCFICaches(); > + > + public: > + // the logging sink Full sentence for the comment, please. ::: tools/profiler/newlib_platform_macros.h @@ +15,5 @@ > + and OS_ macros are defined too, since they are sometimes > + convenient. */ > + > +#undef NEWLIB_PLAT_arm_android > +#undef NEWLIB_PLAT_amd64_linux "x64" is typically used these days instead of "amd64". E.g. IonMonkey does it. @@ +49,5 @@ > + > +#elif defined(__linux__) && defined(__i386__) > +# define NEWLIB_PLAT_x86_linux 1 > +# define NEWLIB_ARCH_x86 1 > +# define NEWLIB_OS_linux 1 Why are the __linux__ ones separated by the __ANDROID__ ones? @@ +64,5 @@ > + > +#elif (defined(_MSC_VER) || defined(__MINGW32__)) && (defined(_M_IX86) || defined(__i386__)) > +# define NEWLIB_PLAT_x86_windows 1 > +# define NEWLIB_ARCH_x86 1 > +# define NEWLIB_OS_windows 1 Why have Windows and Darwin constants? Isn't this library Linux/Android only? ::: tools/profiler/newlib_secmap.h @@ +1,1 @@ > + No license here. @@ +15,5 @@ > +// defined in the ELF ABI supplements for each architecture (??) > + > +// No real register has this number. It's convenient to be > +// able to treat the CFA as "just another register", though. > +#define DW_REG_CFA (-1) What is the "CFA"? @@ +24,5 @@ > +# define DW_REG_ARM_R11 11 > +# define DW_REG_ARM_R12 12 > +# define DW_REG_ARM_R13 13 > +# define DW_REG_ARM_R14 14 > +# define DW_REG_ARM_R15 15 It's pretty common to use enums to define related constants like this. It gives a bit more type safety. Also, these can go in the |newlib|/|LUL| namespace. @@ +44,5 @@ > +//////////////////////////////////////////////////////////////// > +// NExpr // > +//////////////////////////////////////////////////////////////// > + > +// An expression. If |srcReg| is DW_REG_CFA (-1) then it denotes the What kind of expression? Is this a DWARF concept? @@ +50,5 @@ > +// values. > + > +#define NExpr_UNKNOWN ((NUChar)10) // This NExpr denotes no value > +#define NExpr_NODEREF ((NUChar)11) // Value is (.srcReg + .offset) > +#define NExpr_DEREF ((NUChar)12) // Value is *(.srcReg + .offset) I don't understand how you chose 10, 11 and 12 for these values. 11 and 12, in particular, collide with DM_REG_ARM_R{11,12}. @@ +57,5 @@ > + > +struct NExpr { > + NExpr() : how(NExpr_UNKNOWN), reg(0), offset(0) {} > + NExpr(NUChar how, NShort reg, NInt offset) : how(how), > + reg(reg), offset(offset) {} https://developer.mozilla.org/en-US/docs/Developer_Guide/Coding_Style#Classes shows the style that should be used for C++ classes. @@ +58,5 @@ > +struct NExpr { > + NExpr() : how(NExpr_UNKNOWN), reg(0), offset(0) {} > + NExpr(NUChar how, NShort reg, NInt offset) : how(how), > + reg(reg), offset(offset) {} > + // Change the offset. Bad: what if |.how| isn't NExpr_NODEREF? Add an assertion? @@ +59,5 @@ > + NExpr() : how(NExpr_UNKNOWN), reg(0), offset(0) {} > + NExpr(NUChar how, NShort reg, NInt offset) : how(how), > + reg(reg), offset(offset) {} > + // Change the offset. Bad: what if |.how| isn't NExpr_NODEREF? > + NExpr add_delta(long delta) { Opening brace on the next line, here and below. @@ +70,5 @@ > + } > + > + NUChar how; // One of the three NExpr_* constants > + NShort reg; > + NInt offset; Naming: mHow, mReg, mOffset. @@ +74,5 @@ > + NInt offset; > +}; > + > + > +} It's common to comment end of namespaces, viz: } // namespace newlib @@ +98,5 @@ > +// |.srcReg| == DW_REG_CFA since we have no previous value for the CFA. > +// All of the other |_expr| fields can -- and usually do -- specify > +// |.srcReg| == DW_REG_CFA. > +// > +// With that in place, the unwind algorithm proceeds as follows: s/:/./ @@ +100,5 @@ > +// |.srcReg| == DW_REG_CFA. > +// > +// With that in place, the unwind algorithm proceeds as follows: > +// > +// Initially: we have values for the old registers, and a memory image Missing full stop. @@ +124,5 @@ > +namespace newlib { > + > +class RuleSet { > + > + public: Zero-indentation. @@ +176,5 @@ > + // exactly the lowest and highest addresses that any of the entries > + // in this SecMap cover. Hence invariants: > + // > + // m_ruleSets is nonempty > + // <==> m_summaryMinAddr <= m_summaryMaxAddr I'm used to seeing "<=>", but I guess "<==>" is even stronger? :P @@ +193,5 @@ > + > + SecMap(const char* filename, void(*log)(const char*)); > + ~SecMap(); > + > + // binary search m_ruleSets to find one that brackets |ia|. s/binary/Binary/ @@ +211,5 @@ > + bool IsEmpty(); > + > + size_t Size() { return m_ruleSets.size(); } > + > + // min/max summary addr, invariants as per comments above Full sentence, plz. Likewise for several of the comments below in this class. ::: tools/profiler/newlib_summariser.h @@ +3,5 @@ > + * License, v. 2.0. If a copy of the MPL was not distributed with this > + * file, You can obtain one at http://mozilla.org/MPL/2.0/. */ > + > +#ifndef NEWLIB_SUMMARISER_H > +#define NEWLIB_SUMMARISER_H No need to shout! NewLibSummariser_h is typical Moz style. @@ +9,5 @@ > +#include "newlib_secmap.h" > + > +namespace newlib { > + > +class Summariser { Brace on the next line. @@ +21,5 @@ > + int new_reg, int old_reg, long offset, bool deref); > + > + private: > + // The SecMap in which we park the finished summaries (RuleSets). > + SecMap* secmap_; Trailing underscores here, eh? mSecMap, plz :)
Attachment #8372965 - Flags: review?(n.nethercote) → feedback+
Comment on attachment 8372965 [details] [diff] [review] bug938157-newlib-12a.cset Talked with Nick about this yesterday. I'm going to cancel the review until the revised version goes up, mainly so Bugzilla will stop reminding me about it.
Attachment #8372965 - Flags: review?(nfroyd)
Attached patch bug938157-newlib-13d.cset (obsolete) (deleted) — Splinter Review
In reply to review comments in comment 23: > Review of attachment 8372965 [details] [diff] [review] [diff] [review]: > ----------------------------------------------------------------- Summary: * Copyright notices fixed, as per bug 971666 * All issues in your review fixed, unless otherwise noted below. * Renamed all files and types so as to get rid of "newlib" and instead have "LUL" or "Lul". >> newlib_common.cpp >> newlib_common_ext.h >> newlib_common_int.h >> newlib_elf.cpp >> newlib_elf_ext.h >> newlib_elf_int.h >> newlib_exidx.cpp >> newlib_exidx_ext.h > So all these files are simply copied? If so, I won't bother reviewing them. These are all copied from toolkit/crashreporter/google-breakpad/src, although none of them are 1:1 copies of files in that subtree. Rather they are assembled from various bits of various files in there. > How is this going to be tested? Are there existing SPS tests that will > suffice? How to test is still open. This patch doesn't contain any testing functionality. > ::: tools/profiler/moz.build > @@ +46,5 @@ > > 'local_debug_info_symbolizer.cc', > > ] > > > > if CONFIG['OS_TARGET'] in ('Android', 'Linux'): > > + # These files cannot be built in unified mode because of name clashes with mozglue headers on Android. > > What kind of name clashes? I didn't add this comment -- I just changed the leading whitespace so it fitted better in the file. A one line whitespace change. > ::: tools/profiler/newlib_main.cpp > @@ +164,5 @@ > > + if (lo > hi) { > > + // not found > > + return NULL; > > + } > + long int mid = (lo + hi) / 2; > > This computation can overflow. > http://googleresearch.blogspot.com.au/2006/06/extra-extra-read-all-about-it-nearly.html Interesting. In this case though I don't think it can overflow, because the indexing type is "signed machine word" (long int), giving potential overflow when the table size is 2 ^ (#bits-in-machine-word - 2). But that would require the table to take the entire address space and have an element size of 4 bytes or less. The first condition is unlikely and the second is simply not the case. The google paper refers to the case where the process is 64-bit but the two index values are signed 32 bit, which is indeed asking for trouble. > class TaggedWord { > [..] > You could even think about making mValue and mValid private and > providing getters/setters. Depends on how the type is used. I didn't do that -- what are the criteria for doing so? > + NUWord start_avma; > > What does avma mean? You've used it in several places where I'd > probably just use |p|. I added a comment about the terminology (AVMA, SVMA, Bias, Image address) at the top of LulMain.h. > > + // Acquires m_rwl in w-mode. This must be called only after the > > It took me a minute to work out what w-mode and r-mode were. Are they > standard terms? Would "Acquires m_rwl as a reader/writer" be clearer? "held for reading" or "held for writing" seem to be the de-facto terms. Fixed. > ::: tools/profiler/newlib_platform_macros.h > > +#undef NEWLIB_PLAT_arm_android > > +#undef NEWLIB_PLAT_amd64_linux > > "x64" is typically used these days instead of "amd64". E.g. IonMonkey > does it. Ok .. you want to use x86 and x64 for 32- and 64-bit Intel, then? > > +// No real register has this number. It's convenient to be > > +// able to treat the CFA as "just another register", though. > > +#define DW_REG_CFA (-1) > > What is the "CFA"? It means "Canonical Frame Address" -- a notional fixed point on the stack which makes the unwinding descriptions simpler. The name and concept appear in the Dwarf unwind-info specification and play a central role. I added "(Canonical Frame Address)" at the first appearance of the abbreviation. > > +// An expression. If |srcReg| is DW_REG_CFA (-1) then it denotes the > > What kind of expression? Is this a DWARF concept? I added a pointer to the comment on RuleSet, which explains this.
Attachment #8372965 - Attachment is obsolete: true
Attachment #8378314 - Flags: review?(nfroyd)
Comment on attachment 8378314 [details] [diff] [review] bug938157-newlib-13d.cset Review of attachment 8378314 [details] [diff] [review]: ----------------------------------------------------------------- This is the sort of patch where I *really* wish we had an automatic style formatter; there's so many places where the style is...off, but I'm not sure it's worth taking the time to run through them all and fix. Might not fix up the naming of a lot of things, but at least the formatting would be consistent. This is a first pass through. I need to fortify myself prior to the second run-through. ::: tools/profiler/LulCommon.cpp @@ +90,5 @@ > + > +static UniqueStringUniverse* gUSU = nullptr; > + > +// This isn't threadsafe. > +const UniqueString* ToUniqueString(string str) { This would be reasonably threadsafe if it was instead: static UniqueStringUniverse sUniverse; return sUniverse->FindOrCopy(str); } (C++11 requires threadsafety for such constructs; earlier versions do not require it, but I think all reasonable platforms we run on guarantee threadsafety even so.) ::: tools/profiler/LulCommonExt.h @@ +33,5 @@ > +// Original author: Jim Blandy <jimb@mozilla.com> <jimb@red-bean.com> > + > +// module.h: Define google_breakpad::Module. A Module holds debugging > +// information, and can write that information out as a Breakpad > +// symbol file. Is this a comment from the file from which the code was originally imported? @@ +174,5 @@ > +// Use this to conditionally transfer ownership of a heap-allocated object > +// to the caller, usually on method success. > + > +template <typename T> > +class scoped_ptr { I believe mozilla/Scoped.h (mfbt/Scoped.h) is essentially this; it would be better to use the in-tree copy. @@ +263,5 @@ > +// is guaranteed, either on destruction of the scoped_array or via an explicit > +// reset(). Use shared_array or std::vector if your needs are more complex. > + > +template<typename T> > +class scoped_array { Scoped.h provides this too. @@ +357,5 @@ > +// scoped_ptr_malloc<> is similar to scoped_ptr<>, but it accepts a > +// second template argument, the functor used to free the object. > + > +template<typename T, typename FreeProc = ScopedPtrMallocFree> > +class scoped_ptr_malloc { Scoped.h provides this too. ::: tools/profiler/LulDwarfExt.h @@ +53,5 @@ > +#ifdef __PTRDIFF_TYPE__ > +typedef __PTRDIFF_TYPE__ intptr; > +typedef unsigned __PTRDIFF_TYPE__ uintptr; > +#else > +#error "Can't find pointer-sized integral types." All of these typedefs should really go away, given the existence of stdint.h. @@ +373,5 @@ > + const unsigned char *buffer > + = reinterpret_cast<const unsigned char *>(signed_buffer); > + const uint16 buffer0 = buffer[0]; > + const uint16 buffer1 = buffer[1]; > + if (endian_ == ENDIANNESS_LITTLE) { We should be using mozilla/Endian.h for all of these readers. ::: tools/profiler/LulExidx.cpp @@ +79,5 @@ > +// of the index table entries (pairs). For each entry, it: > +// > +// * calls ExceptionTableInfo::ExtabEntryExtract to copy the bytecode > +// out into an intermediate buffer. > + Nit: should have a // here instead of a blank line. @@ +278,5 @@ > + > + > +static void* Prel31ToAddr(const void* addr) > +{ > + uint32_t offset32 = *reinterpret_cast<const uint32_t*>(addr); Do we know for certain that this is going to be aligned properly? Can we just memcpy(&offset32, addr, sizeof(offset32)); instead? @@ +581,5 @@ > + // (Prel31ToAddr(&entry->addr)) > + // - mapping_addr_ + loading_addr_) & 0x7fffffff > + // produces a SVMA. Adding the text_bias_ gives plausible AVMAs. > + uint32_t addr = (reinterpret_cast<char*>(Prel31ToAddr(&entry->addr)) > + - mapping_addr_ + loading_addr_) & 0x7fffffff; Maybe pull this out into a helper function, BiasEntryAddress, and use it here and below to contain the ugliness? ::: tools/profiler/LulMain.cpp @@ +363,5 @@ > + Seg(uintptr_t lo, uintptr_t hi, bool b) : lo(lo), hi(hi), b(b) {} > + uintptr_t lo; uintptr_t hi; bool b; > + }; > + > + const uintptr_t mAX_UINTPTR_T() { return ~(uintptr_t)0; } You know you can just use UINTPTR_MAX for this instead. @@ +436,5 @@ > + } > + mSecMaps.clear(); > + } > + > + // This can happen with the global lock r-held. Nit: "held for reading" @@ +679,5 @@ > + return false; > + } > + > + private: > + // This can happen with the global lock r-held. Maybe "This can be (only?) be called with global lock held for reading."? @@ +750,5 @@ > + } > + > + RuleSet* Lookup(uintptr_t ia) { > + // FIXME: this is suboptimal on ARM/Thumb. Shift right 1 bit > + // before modding. Can we fix this prior to landing? @@ +799,5 @@ > +LUL::LUL(void (*aLog)(const char*)) > +{ > + mozilla::DebugOnly<int> r = pthread_rwlock_init(&mRWlock, nullptr); > + MOZ_ASSERT(!r); > + r = pthread_rwlock_wrlock(&mRWlock); We should have an AutoPthreadWRLock RAII class for these lock/unlock pairs, here and below. @@ +922,5 @@ > + > + r = pthread_rwlock_unlock(&mRWlock); > + MOZ_ASSERT(!r); > + > + MOZ_ASSERT(0); // FIXME: actually do something Presumably we need to do something here before landing? @@ +925,5 @@ > + > + MOZ_ASSERT(0); // FIXME: actually do something > + > + if (aSize > 0) > + mSegArray->add(aRXavma, aRXavma + aSize - 1, false); Presumably we should hold the read lock for this? @@ +992,5 @@ > + case LExpr::UNKNOWN: > + return TaggedUWord(); > + case LExpr::NODEREF: { > + TaggedUWord tuw = evaluate_reg(aExpr.mReg, aOldRegs, aCFA); > + tuw.add(TaggedUWord((long long int)aExpr.mOffset)); This cast should just be uintptr_t? @@ +997,5 @@ > + return tuw; > + } > + case LExpr::DEREF: { > + TaggedUWord tuw = evaluate_reg(aExpr.mReg, aOldRegs, aCFA); > + tuw.add(TaggedUWord((long long int)aExpr.mOffset)); This too? ::: tools/profiler/LulMain.h @@ +152,5 @@ > + > +class LUL { > +public: > + // Create; supply a logging sink. Initialises the rw-lock. > + LUL(void (*mLog)(const char*)); Nit: aLog. @@ +254,5 @@ > + // > + // The CFICaches themselves are thread-local and can be both read > + // and written when mRWlock is held for reading. It would probably > + // be faster to use the pthread_{set,get}specific functions, but > + // also more difficult. "more difficult"? Why is that? ::: tools/profiler/LulPlatformMacros.h @@ +13,5 @@ > +// Define platform selection macros in a consistent way. The primary > +// factorisation is on (ARCH,OS) pairs ("PLATforms") but ARCH_ and OS_ > +// macros are defined too, since they are sometimes convenient. > + > +#undef NEWLIB_PLAT_amd64_linux All these NEWLIB_PLAT_ macros should have LUL_ prefixes instead. @@ +52,5 @@ > + > + > +// Make sure the standard integer types have expected sizes, since LUL > +// uses them heavily. In particular uintptr_t is used to mean > +// "unsigned machine word". These all seem superfluous, since the stdint.h sizes are indeed defined to be these... @@ +61,5 @@ > +static_assert(sizeof(uint16_t) == 2, "uint16_t has wrong size"); > +static_assert(sizeof(int16_t) == 2, "int16_t has wrong size"); > +static_assert(sizeof(uint8_t) == 1, "uint8_t has wrong size"); > +static_assert(sizeof(unsigned long long int) == 8, > + "unsigned long long int has wrong size"); ...we shouldn't really be relying on this... @@ +66,5 @@ > + > +static_assert(sizeof(uintptr_t) == sizeof(void*), "uintptr_t has wrong size"); > +static_assert(sizeof(size_t) >= sizeof(void*), "size_t has wrong size"); > +static_assert(sizeof(off_t) >= sizeof(void*), "off_t has wrong size"); > +static_assert(sizeof(long int) == sizeof(void*), "long int has wrong size"); ...or this. ::: tools/profiler/LulSecMap.h @@ +221,5 @@ > + ~SecMap(); > + > + // Binary search mRuleSets to find one that brackets |ia|. > + // lo and hi need to be signed, else the loop termination tests > + // don't work properly. Note that this works correctly even when This comment about low and hi seems out of place. ::: tools/profiler/UnwinderThread2.cpp @@ +918,1 @@ > # elif defined(SPS_PLAT_amd64_darwin) Is this all going to build and work on Darwin, too? @@ +949,5 @@ > # endif > > /* Copy up to N_STACK_BYTES from rsp-REDZONE upwards, but not > going past the stack's registered top point. Do some basic > + sanity checks too. This assumes that the TaggedNUWord holding What are TaggedNUWords? @@ +1860,5 @@ > } > } > > static > +void do_newlib_unwind_Buffer(/*OUT*/PCandSP** pairs, do_lul_unwind_Buffer? LulUnwindBuffer? @@ +1884,5 @@ > # else > # error "Unknown plat" > # endif > > + // FIXME: should we reinstate the ability to use separate debug objects? Yes. But that's OK for a followup. @@ +1905,5 @@ > // Set the max number of scanned or otherwise dubious frames > // to the user specified limit > + size_t scannedFramesAllowed > + = ((sUnwindStackScan > 256) ? 256 : (sUnwindStackScan < 0) ? 0 > + : sUnwindStackScan); I swear we have a function for clamping values like this, but I cannot find it. We should be using std::{min,max} here at the very least. @@ +1910,4 @@ > > + // This implies that the max number of frames is 256, so as to avoid > + // the unwinder wasting a lot of time looping on corrupted stacks. > + uintptr_t framePCs[256]; Should probably |static const size_t maxNumberFrames = 256;| above and use it in defining scannedFramesAllowed and this. #define if you need to. @@ +1911,5 @@ > + // This implies that the max number of frames is 256, so as to avoid > + // the unwinder wasting a lot of time looping on corrupted stacks. > + uintptr_t framePCs[256]; > + uintptr_t frameSPs[256]; > + size_t framesAvail = sizeof(framePCs)/sizeof(framePCs[0]); mozilla::ArrayLength(framePCs); @@ +1948,5 @@ > + // (unsigned long long int)frame_amd64->context.rsp, > + // frame_amd64->trust_description().c_str()); > + //} > + (*pairs)[framesUsed-1-frame_index].pc = framePCs[frame_index]; > + (*pairs)[framesUsed-1-frame_index].sp = frameSPs[frame_index]; It looks like you can now get rid of all the #ifdeffery here, since you're just reading things out of framePCs/frameSPs all the time.
Attachment #8378314 - Flags: review?(n.nethercote)
> These are all copied from toolkit/crashreporter/google-breakpad/src, > although none of them are 1:1 copies of files in that subtree. Rather > they are assembled from various bits of various files in there. Can you add comments at the top indicating if they are pure copies or copies-with-modifications? > How to test is still open. This patch doesn't contain any testing > functionality. As we discussed, a combination of high-level system smoke tests (i.e. does it seem to work, not crash, etc) plus some low-level unit tests is probably the best you can easily do. > > ::: tools/profiler/newlib_main.cpp > > @@ +164,5 @@ > > > + if (lo > hi) { > > > + // not found > > > + return NULL; > > > + } > > + long int mid = (lo + hi) / 2; > > > > This computation can overflow. > > http://googleresearch.blogspot.com.au/2006/06/extra-extra-read-all-about-it-nearly.html > > Interesting. In this case though I don't think it can overflow, > because the indexing type is "signed machine word" (long int), giving > potential overflow when the table size is > 2 ^ (#bits-in-machine-word - 2). > But that would require the table to take the entire address space > and have an element size of 4 bytes or less. The first condition is > unlikely and the second is simply not the case. > > The google paper refers to the case where the process is 64-bit but > the two index values are signed 32 bit, which is indeed asking for > trouble. Can you either add a comment about this, or just change the computation to the always-safe version? I can't find it now, but there was a bug recently where somebody audited all the binary search implementations in the mozilla codebase for exactly this bug. (Indeed, that's why I thought to check your binary search implementation.) > > class TaggedWord { > > [..] > > You could even think about making mValue and mValid private and > > providing getters/setters. Depends on how the type is used. > > I didn't do that -- what are the criteria for doing so? If the getters/setters would just be trivial, it's probably not worth doing. But if there's some kind of assertion that could be inserted (such as a tag check) then it probably is worth doing. Alternatively, if there would be trivial getter but no setter (i.e. it's read-only after construction) then you should make the field const and the getter probably isn't necessary. > Ok .. you want to use x86 and x64 for 32- and 64-bit Intel, then? Yes please.
froydnj, you made several comments about modifying the code taken from Breakpad to make it follow Mozilla style. jseward and I discussed this and figured it wasn't worthwhile because it would cause unhelpful divergence with the other copy of Breakpad in the tree. Hope this sounds reasonable.
Comment on attachment 8378314 [details] [diff] [review] bug938157-newlib-13d.cset Review of attachment 8378314 [details] [diff] [review]: ----------------------------------------------------------------- Sorry for the delay. I didn't look at UnwinderThread2.cpp; I figure Benoit needs to review that. I also barely looked at all the files that were taken from Breakpad and libunwind (i.e. the ones with Google or Linaro copyright notices at the top). I also only got partway through LulMain.cpp. I'll do the rest tomorrow. Would it make sense to create a tools/profiler/lul/ directory and remove the "Lul" prefix from all these files? I'll let you decide. ::: tools/profiler/LulCommonInt.h @@ +6,5 @@ > + > +#ifndef LulCommonInt_h > +#define LulCommonInt_h > + > +// There is no internal-only component; everything is exported. In which case, just get rid of this file, and rename LulCommonExt.h as LulCommon.h? ::: tools/profiler/LulDwarfSummariser.cpp @@ +22,5 @@ > + mCurrAddr = 0; > + mMax1Addr = 0; // Gives empty range > + > + // Initialise the running RuleSet to "haven't got a clue" status. > + (void) new (&mCurrRules) RuleSet(); Placement new doesn't need a |(void)| cast. @@ +33,5 @@ > + char buf[100]; > + snprintf(buf, sizeof(buf), "ZZ Entry(%llx, %llu)\n", > + (unsigned long long int)aAddress, > + (unsigned long long int)aLength); > + buf[sizeof(buf)-1] = 0; Wow, you _are_ paranoid :) @@ +41,5 @@ > + // that the previous summary, if any, has been properly finished > + // by a call to ::End(). > + mCurrAddr = aAddress; > + mMax1Addr = aAddress + aLength; > + (void) new (&mCurrRules) RuleSet(); Ditto about |(void)|. @@ +45,5 @@ > + (void) new (&mCurrRules) RuleSet(); > +} > + > +void Summariser::Rule(uintptr_t aAddress, > + int aNewReg, int aOldReg, long aOffset, bool aDeref) Use intptr_t instead of long for |aOffset|? @@ +47,5 @@ > + > +void Summariser::Rule(uintptr_t aAddress, > + int aNewReg, int aOldReg, long aOffset, bool aDeref) > +{ > + char buf[100]; Move this declaration inside the |if (DEBUG_SUMMARISER)| block. @@ +53,5 @@ > + if (DEBUG_SUMMARISER) { > + snprintf(buf, sizeof(buf), > + "ZZ 0x%llx old-r%d = %sr%d + %ld%s\n", > + (unsigned long long int)aAddress, aNewReg, > + aDeref ? "*(" : "", aOldReg, aOffset, aDeref ? ")" : ""); Can you use a more meaningful marker than "ZZ"? "LUL"? @@ +61,5 @@ > + if (mCurrAddr < aAddress) { > + // Flush the existing summary first. > + mCurrRules.mAddr = mCurrAddr; > + // FIXME: guard against overflow > + mCurrRules.mLen = aAddress - mCurrAddr; That sounds like a good idea... @@ +84,5 @@ > + > + case DW_REG_CFA: > + // This is a rule that defines the CFA. The only forms we > + // choose to represent are: r7/11/12/13 + offset. > + if (aDeref || aOffset != (long)(int32_t)aOffset) On ARM, won't long be equivalent to int32_t? In which case the second test can't fail? @@ +163,5 @@ > + // is the heart of the summarisation process. > + if (aNewReg == DW_REG_CFA) { > + // This is a rule that defines the CFA. The only forms we can > + // represent are: = SP+offset or = FP+offset. > + if (!aDeref && aOffset == (long)(int32_t)aOffset Again with this test -- on 32-bit I can't see how it would fail, and on 64-bit it can fail but I don't see the point of truncating a long to 32 bits. @@ +185,5 @@ > + switch (aNewReg) { > + case DW_REG_INTEL_XBP: mCurrRules.mXbpExpr = expr; break; > + case DW_REG_INTEL_XSP: mCurrRules.mXspExpr = expr; break; > + case DW_REG_INTEL_XIP: mCurrRules.mXipExpr = expr; break; > + default: MOZ_ASSERT(0); You could do |MOZ_CRASH("bad reg");| instead if you want to be safe in non-debug builds. @@ +223,5 @@ > + mLog("ZZ End\n"); > + if (mCurrAddr < mMax1Addr) { > + mCurrRules.mAddr = mCurrAddr; > + // FIXME: guard against overflow > + mCurrRules.mLen = mMax1Addr - mCurrAddr; Yeah... ::: tools/profiler/LulDwarfSummariser.h @@ +13,5 @@ > + > +class Summariser > +{ > + > +public: Nit: unnecessary blank line. @@ +37,5 @@ > + // construction could possibly apply. If there are no further > + // incoming events then mCurrRules will eventually be emitted > + // as-is, for the range mCurrAddr.. mMax1Addr - 1, if that is > + // nonempty. > + uintptr_t mMax1Addr; FYI, SpiderMonkey has a convention where the word "limit" is used for max+1 values. You don't have to use it, just thought you might be interested. @@ +43,5 @@ > + // The bias value (to add to the SVMAs, to get AVMAs) to be used > + // when adding entries into mSecMap. > + uintptr_t mTextBias; > + > + // A logging sink, for debugging Full stop. ::: tools/profiler/LulMain.cpp @@ +35,5 @@ > + > +static const char* name_DW_REG(int16_t aReg) { > + switch (aReg) { > + case DW_REG_CFA: return "cfa"; > +# if defined(NEWLIB_ARCH_amd64) || defined(NEWLIB_ARCH_x86) Don't indent the |if|, and likewise elsewhere. @@ +95,5 @@ > + res += show_rule(" R15", mR15expr); > + res += show_rule(" R7", mR7expr); > + res += show_rule(" R11", mR11expr); > + res += show_rule(" R12", mR12expr); > + res += show_rule(" R13", mR13expr); Any reason for the non-obvious ordering here? @@ +137,5 @@ > +//////////////////////////////////////////////////////////////// > +// SecMap // > +//////////////////////////////////////////////////////////////// > + > +// See header file LulSecMap.h for comments about invariants. It's a bit weird having LulSecMap.h but no LulSecMap.cpp, and then defining things in this file. Mozilla .h files very often have corresponding .cpp files. @@ +150,5 @@ > + > +SecMap::~SecMap() { > + mRuleSets.clear(); > + // Is this pointful? > + vector<RuleSet>().swap(mRuleSets); I don't think it's pointful. AIUI, the latter works if you want to destroy the elements but then keep using it, but v.clear() is the more obvious alternative. But since the vector gets destructed anyway, all this is unnecessary. @@ +155,5 @@ > + free(mFileName); > +} > + > +RuleSet* SecMap::FindRuleSet(uintptr_t ia) { > + // binary search mRuleSets to find one that brackets |ia|. Nit: capitalize 'b' in "Binary". @@ +192,5 @@ > +} > + > + > +static > +bool cmp_RuleSets_le_by_addr(const RuleSet& rs1, const RuleSet& rs2) { Oh, a Moz style thing that you're breaking all over the place: function names should always be at the start of their own line, viz: static bool cmp_RuleSets_le_by_addr(const RuleSet& rs1, const RuleSet& rs2) { Please fix! (But note that this rule isn't the case for function definition/declarations inside classes.) Also, CamelCaps function names, please. @@ +204,5 @@ > + if (mRuleSets.empty()) > + return; > + > + if (aLen == 0) { > + // This should never happen So add an assertion or MOZ_CRASH? BTW, missing full stop. @@ +209,5 @@ > + mRuleSets.clear(); > + return; > + } > + > + // sort by start addresses Proper sentences, plz. @@ +217,5 @@ > + // Set its length to zero, so that the next pass will remove it. > + for (size_t i = 0; i < mRuleSets.size(); i++) { > + RuleSet* rs = &mRuleSets[i]; > + if (rs->mLen > 0 > + && (rs->mAddr < aStart || rs->mAddr + rs->mLen > aStart + aLen)) { && at end of previous line. @@ +272,5 @@ > + > + MOZ_ASSERT(mRuleSets.size() == j); > + } > + > + // final check on for ascending, non overlapping, non zero sized Proper sentence, plz. @@ +278,5 @@ > + if (n > 0) { > + MOZ_ASSERT(mRuleSets[0].mLen > 0); > + for (size_t i = 1; i < n; i++) { > + mozilla::DebugOnly<RuleSet*> prev = &mRuleSets[i-1]; > + mozilla::DebugOnly<RuleSet*> here = &mRuleSets[i]; I'd wrap this whole bit in |#ifdef DEBUG| since it's a loop, and then you don't need to use DebugOnly<>. @@ +343,5 @@ > + preen(); > + } > + > + bool getBoundingCodeSegment(/*OUT*/uintptr_t* rx_min, > + /*OUT*/uintptr_t* rx_max, uintptr_t addr) { Why are the outparams first? @@ +360,5 @@ > + > + private: > + struct Seg { > + Seg(uintptr_t lo, uintptr_t hi, bool b) : lo(lo), hi(hi), b(b) {} > + uintptr_t lo; uintptr_t hi; bool b; One line per field. And what is |b| for? A better name and/or comment would be good. @@ +367,5 @@ > + const uintptr_t mAX_UINTPTR_T() { return ~(uintptr_t)0; } > + > + void preen() { > + for (std::vector<Seg>::iterator it = mSegs.begin(); > + it < mSegs.end()-1; it++) { Put the |it++| on a separate line. @@ +382,5 @@ > + std::vector<Seg>::size_type find(uintptr_t a) { > + long int lo = 0; > + long int hi = (long int)mSegs.size(); > + while (true) { > + // The unsearched space is lo .. hi inclusive Full stop. @@ +387,5 @@ > + if (lo > hi) { > + // Not found. This can't happen. > + return (std::vector<Seg>::size_type)(-1); > + } > + long int mid = (lo + hi) / 2; Again, comment why this is safe, or change it to the always-safe computation. @@ +398,5 @@ > + } > + > + void split_at(uintptr_t a) { > + std::vector<Seg>::size_type i = find(a); > + if (mSegs[i].lo == a) Always brace single-line blocks. @@ +408,5 @@ > + > + void show() { > + printf("<< %d entries:\n", (int)mSegs.size()); > + for (std::vector<Seg>::iterator it = mSegs.begin(); > + it < mSegs.end(); it++) { |it++| on its own line. @@ +848,5 @@ > + r = pthread_rwlock_unlock(&mRWlock); > + MOZ_ASSERT(!r); > +} > + > +void LUL::NotifyMap(uintptr_t aRXavma, size_t aSize, off_t aOffset, aOffset is unused. Remove it? ::: tools/profiler/LulMain.h @@ +10,5 @@ > +#include <pthread.h> // rwlock_t > +#include <sys/types.h> // off_t > + > +#include <vector> > +#include <map> Are these two needed? I don't see anything in this file that requires them. Maybe they're bootlegged elsewhere, in which case moving them to the appropriate files would be better. @@ +25,5 @@ > +// difficult to understand and maintain. LUL tries hard to avoid > +// using the word "address" and instead uses the following more > +// precise terms: > +// > +// * SVMA ("Stated VMA"): this is an address of a symbol (etc) as it This is a nice comment, but you don't explain what "VMA" is. Virtual memory address, I guess? Please make it explicit. @@ +28,5 @@ > +// > +// * SVMA ("Stated VMA"): this is an address of a symbol (etc) as it > +// is stated in the symbol table, or other metadata, of an object. > +// Such values are typically small and start from zero or > +// thereabouts, unless the object has been prelinked. Does that mean it's really an offset? @@ +42,5 @@ > +// number of pages. Once we know the bias for a given object's > +// text section (for example), we can compute the AVMAs of all of > +// its text symbols by adding the bias to their SVMAs. > +// > +// * "Image address": typically, to read debuginfo from an object we Any reason this isn't called the "Image VMA"? @@ +56,5 @@ > + > +// A machine word plus validity tag. > +class TaggedUWord { > +public: > + // Construct a valid one Full stop at the end of sentences, plz, here and below. @@ +82,5 @@ > + bool valid() const { return mValid; } > + > +private: > + uintptr_t mValue; > + bool mValid; Can these fields be const? @@ +108,5 @@ > + > + > +// CONFIGURABLE > +// The maximum number of bytes in a stack snapshot > +static const size_t N_STACK_BYTES = 32768; What does "CONFIGURABLE" mean? At first I thought it meant you could change this via ./configure, but that's clearly not the case. I guess you just mean that this is a number that could sensibly be changed? (Full stop at end of the sentence, too.) @@ +113,5 @@ > + > +// The stack chunk image that will be unwound. > +struct StackImage { > + // [start_avma, +len) specify the address range in the buffer. > + // Obviously we require 0 <= len <= NEWLIB_N_STACK_BYTES. s/NEWLIB_//. (You should just |grep -i| for newlib through this whole patch, to find the places you missed.) @@ +132,5 @@ > +// therefore require only a read-lock. Hence multiple threads can > +// unwind in parallel. > +// > +// The library needs to maintain state which is private to each > +// unwinder thread -- the CFI fast cache. Hence unwinder threads Define "CFI" here? @@ +136,5 @@ > +// unwinder thread -- the CFI fast cache. Hence unwinder threads > +// first need to register with the library, so their identities are > +// known. Also, for maximum effectiveness of the CFI caching, it is > +// preferable to have a small number of very-busy unwinder threads > +// rather than a large number of mostly-idle unwinder threads. Is write-starvation an issue? E.g. if we're frequently reading, we might not get the write-lock when required. If so, what would the consequences be? @@ +143,5 @@ > +// handler, since this risks deadlock. In particular this means > +// a thread may not unwind itself from within a signal handler > +// frame. It might be safe to call ::Unwind on its own stack > +// from not-inside a signal frame, although even that cannot be > +// guaranteed deadlock free. If you allow 80-char lines instead of restricting them to 72, you can get a lot more words on each line :) @@ +155,5 @@ > + // Create; supply a logging sink. Initialises the rw-lock. > + LUL(void (*mLog)(const char*)); > + > + // Destroy. mRWlock must be held for writing. By doing that, waits > + // for all unwinder threads to finish any ::Unwind calls they may be Nit: in C++, |::Foo| refers to an identifier |Foo| which is in the global namespace. I'd write this as Unwind() or |Unwind| or just Unwind. @@ +164,5 @@ > + // Notify of a new r-x mapping, and load the associated unwind info. > + // The filename is strdup'd and used for debug printing. If > + // mapped_image is NULL, this function will mmap/munmap the file > + // itself, so as to be able to read the unwind info. If > + // mapped_image is non-NULL then it is assumed to point to a s/mapped_image/aMappedImage/, more than once. @@ +177,5 @@ > + // associated unwind info. Acquires mRWlock for writing. Note that > + // to avoid segfaulting the stack-scan unwinder (which inspects code > + // areas), this must be called before the code area in question is > + // really unmapped. > + void NotifyUnmap(uintptr_t aAvma, size_t aSize); Is it worth calling them NotifyAfterMap() and NotifyBeforeUnmap(), or something like that, to make the ordering requirement clearer? @@ +212,5 @@ > + // > + // The calling thread must previously have registered itself via > + // RegisterUnwinderThread. Returns false, and does not unwind, if > + // the calling thread is not registered for unwinding. Otherwise > + // returns true. I'd probably call MOZ_CRASH() if it's not registered, and make the return type void. No point in legitimizing silly sequences. @@ +217,5 @@ > + bool Unwind(/*OUT*/uintptr_t* aFramePCs, /*OUT*/uintptr_t* aFrameSPs, > + /*OUT*/size_t* aFramesUsed, size_t aFramesAvail, > + size_t aScannedFramesAllowed, > + /*OUT*/size_t* aScannedFramesAcquired, > + UnwindRegs* aStartRegs, StackImage* aStackImg); The argument order here is... interesting. How about this? aStackImg, aStartRegs, aFramesAvail, aScannedFramesAllowed, /*OUT*/aFramesUsed, /*OUT*/aFramePCs, /*OUT*/aFrameSPs, /*OUT*/aScannedFramesAcquired Speaking of which, you haven't commented every argument, and I don't understand why you need both |aFramesAvail| & |aScannedFramesAllowed|, and |aFramesUsed| & |aScannedFramesAcquired|. At first guess I'd think those four could be reduced to two. @@ +225,5 @@ > + void (*mLog)(const char*); > + > +private: > + // Invalidate the caches. Requires mRWlock to be held for writing; > + // does not acquire it itself. Is this requirement asserted? Worth adding the assertion if it's not, and mentioning it here. @@ +236,5 @@ > + // unwind info. Basically a sorted array of (addr, len, info) > + // records. Threads wishing to query this field must hold mRWlock > + // for reading. Threads wishing to modify this field must hold > + // mRWlock for writing. > + PriMap* mPriMap; If this is a private field, how do threads query/modify it? @@ +238,5 @@ > + // for reading. Threads wishing to modify this field must hold > + // mRWlock for writing. > + PriMap* mPriMap; > + > + Nit: extra blank line. @@ +254,5 @@ > + // > + // The CFICaches themselves are thread-local and can be both read > + // and written when mRWlock is held for reading. It would probably > + // be faster to use the pthread_{set,get}specific functions, but > + // also more difficult. How important is speed? How often is this accessed? ::: tools/profiler/LulPlatformMacros.h @@ +52,5 @@ > + > + > +// Make sure the standard integer types have expected sizes, since LUL > +// uses them heavily. In particular uintptr_t is used to mean > +// "unsigned machine word". I agree; these assertions aren't necessary. ::: tools/profiler/LulSecMap.h @@ +36,5 @@ > + DW_REG_ARM_R13 = 13, > + DW_REG_ARM_R14 = 14, > + DW_REG_ARM_R15 = 15, > +# elif defined(NEWLIB_ARCH_amd64) > + // Because the X86 (32 bit) and AMD64 (64 bit) summarisers are Nit: s/X86/x86/ and s/AMD64/x64/. @@ +59,5 @@ > +// An expression -- very primitive. Denotes either "register + > +// offset" or a dereferenced version of the same. So as to allow > +// convenient handling of Dwarf-derived unwind info, the register may > +// also denote the CFA. A large number of these need to be stored, so > +// there is some effort to make sure it fits into 8 bytes. See I'd call a static_assert more than "some effort" :) Say "we ensure", or something like that. @@ +64,5 @@ > +// comment below on RuleSet to see how expressions fit into the bigger > +// picture. > + > +struct LExpr { > + // Denotes an expression with no value Full stops needed, here and below. @@ +75,5 @@ > + // Denotes any expressible expression > + LExpr(uint8_t how, int16_t reg, int32_t offset) > + : mOffset(offset) > + , mReg(reg) > + , mHow(how) Having differing orders for the fields and the arguments is odd. Make them consistent? @@ +82,5 @@ > + // Change the offset for an expression that references memory. > + LExpr add_delta(long delta) > + { > + MOZ_ASSERT(mHow == NODEREF); > + return (mHow == NODEREF) ? LExpr(mHow, mReg, mOffset+delta) You assert |mHow == NODEREF| and then immediately check it. At first I thought something's wrong, but I guess you're being doubly-defensive? A comment about this would be good. @@ +90,5 @@ > + // Dereference an expression that denotes a memory address. > + LExpr deref() > + { > + MOZ_ASSERT(mHow == NODEREF); > + return (mHow == NODEREF) ? LExpr(DEREF, mReg, mOffset) Ditto. @@ +104,5 @@ > + DEREF }; // Value is *(mReg + mOffset) > + > + int32_t mOffset; // 32-bit signed offset should be more than enough > + int16_t mReg; // A DW_REG_ value > + uint8_t mHow; // UNKNOWN, NODEREF or DEREF Could these fields be |const|? @@ +131,5 @@ > +// |mReg| == DW_REG_CFA since we have no previous value for the CFA. > +// All of the other |Expr| fields can -- and usually do -- specify > +// |mReg| == DW_REG_CFA. > +// > +// With that in place, the unwind algorithm proceeds as follows. Nit: I'd put '-' or a number before each step, so it's clearer when the algorithm steps finish. @@ +161,5 @@ > + RuleSet(); > + void print(void(*log)(const char*)); > + > + // Find the LExpr* for a given DW_REG_ value in this class. > + LExpr* exprForRegno(int regno); If you give the DW_REG_ enum a name, you could use its type for the argument here. @@ +191,5 @@ > +// SecMap // > +//////////////////////////////////////////////////////////////// > + > +// A SecMap may have zero address range, temporarily, whilst RuleSets > +// are being added to it. That's OK. But adding a zero-range SecMap Nit: "That's OK." is redundant w.r.t. the preceding sentence. @@ +244,5 @@ > + uintptr_t mSummaryMaxAddr; > + > +private: > + // False whilst adding entries; true once it is safe to call FindRuleSet. > + // Transition (false->true) is caused by calling ::PrepareRuleSets. Misuse of ::Foo again. @@ +247,5 @@ > + // False whilst adding entries; true once it is safe to call FindRuleSet. > + // Transition (false->true) is caused by calling ::PrepareRuleSets. > + bool mUsable; > + > + // A vector of RuleSets, sorted, nonoverlapping (post ::Prepare). Ditto for ::Prepare. @@ +252,5 @@ > + std::vector<RuleSet> mRuleSets; > + > + // Name of the associated file. Not sure that we really need > + // to store this. > + char* mFileName; If you make this a ScopedFreePtr (see mfbt/ScopedPtr.h) you don't need to free it in the destructor. But as the comment suggests, it's not used! Just remove it. ::: tools/profiler/moz.build @@ +46,5 @@ > 'local_debug_info_symbolizer.cc', > ] > > if CONFIG['OS_TARGET'] in ('Android', 'Linux'): > + # These files cannot be built in unified mode because of name clashes with mozglue headers on Android. By changing SOURCES, you've changed the meaning of "These files". If the LUL files don't can't problems with unified builds, then they should be added to UNIFIED_SOURCES instead of SOURCES. (Unified builds are ones where we concatenate up to 16 .cpp files together before compiling, which speeds up compilation a lot. Some files break when you do this due to conflicts, hence the SOURCES/UNIFIED_SOURCES distinction.) @@ +59,5 @@ > ] > if CONFIG['CPU_ARCH'] == 'arm': > SOURCES += [ > 'EHABIStackWalk.cpp', > + 'LulExidx.cpp', Ditto here.
Comment on attachment 8378314 [details] [diff] [review] bug938157-newlib-13d.cset Review of attachment 8378314 [details] [diff] [review]: ----------------------------------------------------------------- OK, if we're just letting things get cut-and-pasted in from breakpad wholesale, then reviewing this patch is a lot simpler! ::: tools/profiler/LulMain.cpp @@ +214,5 @@ > + std::sort(mRuleSets.begin(), mRuleSets.end(), cmp_RuleSets_le_by_addr); > + > + // Detect any entry not completely contained within [start, +len). > + // Set its length to zero, so that the next pass will remove it. > + for (size_t i = 0; i < mRuleSets.size(); i++) { ++i always, here and elsewhere. @@ +367,5 @@ > + const uintptr_t mAX_UINTPTR_T() { return ~(uintptr_t)0; } > + > + void preen() { > + for (std::vector<Seg>::iterator it = mSegs.begin(); > + it < mSegs.end()-1; it++) { And make it |++it| for efficiency reasons. Here and elsewhere. @@ +533,5 @@ > + // Almost all of the remaining cases that occur in practice are > + // of the form CALL *simm8(reg) or CALL *simm32(reg). > + // Unfortunately with somewhat irregular encoding. Currently > + // these are not detected, although it would be easy enough to > + // do so. It looks like you handle them for x86, though, why not handle them here? Why not unify these blocks? |CALL simm32(%rip)| is the only thing that's x64-specific, and that's easy enough to separate out. @@ +1096,5 @@ > + TaggedUWord last_valid_sp = TaggedUWord(); > + > + // Stack-scan control > + unsigned int n_scanned_frames = 0; // # s-s frames recovered so far > + const int nUM_SCANNED_WORDS = 50; // max allowed scan length This is an odd way to write the name, though I see why you're doing it. Why not just: static const int MAX_SCANNED_WORDS = 50; @@ +1155,5 @@ > + last_valid_sp = sp; > + } else { > + MOZ_ASSERT(last_valid_sp.valid()); > + if (sp.valid()) { > + if (sp.value() < last_valid_sp.value()) { Do we care about making this portable to architectures where stacks grow up? @@ +1324,5 @@ > + // (2) the current (callee's) RBP will point at that word: > + regs.xbp = deref_TUW(regs.xbp, aStackImg); > + } > + else > + if (regs.xbp.valid() |else| cuddles with braces and |if| likewise, so: |} else if (...) { @@ +1344,5 @@ > + sp.add(TaggedUWord(8)); > + > +# elif defined(NEWLIB_ARCH_x86) > + // This is exactly the same logic as for amd64, except at > + // half the word size. Then can we just have a: #if defined(NEWLIB_ARCH_x64) TaggedUWord addressSize = TaggedUWord(8) #else TaggedUWord addressSize = TaggedUWord(4) #endif or similar and unify these two code blocks? @@ +1436,5 @@ > + > + > +void LUL::InvalidateCFICaches() > +{ > + // NB: the caller must hold m_rwl in w-mode. Nit: "held for writing"? ::: tools/profiler/LulSecMap.h @@ +158,5 @@ > +class RuleSet { > +public: > + > + RuleSet(); > + void print(void(*log)(const char*)); Nit: aLog. @@ +161,5 @@ > + RuleSet(); > + void print(void(*log)(const char*)); > + > + // Find the LExpr* for a given DW_REG_ value in this class. > + LExpr* exprForRegno(int regno); Nit: aRegno. ::: tools/profiler/UnwinderThread2.cpp @@ +1647,5 @@ > + // just create the unwinder library right now, and get it to read > + // all the unwind information. > + LOG("unwind_thr_fn: START"); > + MOZ_ASSERT(!sLUL); > + sLUL = new lul::LUL(logging_sink_for_LUL); Nit: indentation. @@ +1658,1 @@ > /* If we're the first thread in, we'll need to allocate the buffer Um. So if multiple threads can get here, we're allocating multiple copies of LUL above. That seems bad. We should protect initialization of sLUL under g_spinLock? Or does this comment need changing for the new world order?
Attachment #8378314 - Flags: review?(nfroyd) → feedback+
Comment on attachment 8378314 [details] [diff] [review] bug938157-newlib-13d.cset Review of attachment 8378314 [details] [diff] [review]: ----------------------------------------------------------------- Ok, I think this is good enough to give r+ -- I don't feel the need to see it again after all the comments have been addressed. It definitely needs some tests before landing, though. ::: tools/profiler/LulMain.cpp @@ +443,5 @@ > + return sm ? sm->FindRuleSet(ia) : nullptr; > + } > + > + // Add a secondary map. No overlaps allowed w.r.t. existing > + // secondary maps. Global lock must be w-held. "held for writing". @@ +463,5 @@ > + for (i = 0; i < num_secMaps; i++) { > + SecMap* sm_i = mSecMaps[i]; > + MOZ_ASSERT(sm_i->mSummaryMinAddr <= sm_i->mSummaryMaxAddr); > + if (aSecMap->mSummaryMinAddr < sm_i->mSummaryMaxAddr) { > + // we need to insert it before mSecMaps[i] Proper sentence, plz. @@ +489,5 @@ > + } > + > + // Assess heuristically whether the given address is an instruction > + // immediately following a call instruction. The caller is required > + // to r-hold the global lock. "hold for reading". @@ +507,5 @@ > + uintptr_t insns_min, insns_max; > + bool b = aSegArray->getBoundingCodeSegment(&insns_min, &insns_max, ia); > + if (!b) > + // no code (that we know about) at this address > + return false; Brace the if-block. @@ +606,5 @@ > + // call *simm32(%edi) FF97 simm32 > + // > + if (ia - 3 >= insns_min > + && p[-3] == 0xFF > + && (p[-2] >= 0x50 && p[-2] <= 0x57 && p[-2] != 0x54)) { Moz style is to put the && on the preceding line (https://developer.mozilla.org/en-US/docs/Developer_Guide/Coding_Style#Operators). You do this in a bunch of places. @@ +681,5 @@ > + > + private: > + // This can happen with the global lock r-held. > + SecMap* FindSecMap(uintptr_t ia) { > + // binary search mSecMaps to find one that brackets |ia|. s/b/B/ @@ +726,5 @@ > +// It has to distinguish 3 cases: > +// > +// (1) .second == (RuleSet*)0 ==> cache slot not in use > +// (2) .second == (RuleSet*)1 ==> slot in use, no RuleSet avail > +// (3) .second > (RuleSet*)1 ==> slot in use, RuleSet* available Please create static const members for the 0 and 1 special cases. @@ +779,5 @@ > + return fallback; > + } > + > + private: > + // This should be prime Full stop. @@ +783,5 @@ > + // This should be prime > + static const int N_ENTRIES = 509; > + > + // See comment above for the meaning of these entries. > + std::pair<uintptr_t,RuleSet*> mCache[N_ENTRIES]; I'd introduce a new private struct for this rather than using pair<>; that way you'll get nicer names than |first| and |second|. @@ +838,5 @@ > + pthread_t me = pthread_self(); > + CFICache* cache = new CFICache(mPriMap); > + > + std::pair<std::map<pthread_t,CFICache*>::iterator, bool> > + res = mCaches.insert( std::pair<pthread_t,CFICache*>(me, cache) ); That's a pretty gruesome type! @@ +856,5 @@ > + mozilla::DebugOnly<int> r = pthread_rwlock_wrlock(&mRWlock); > + MOZ_ASSERT(!r); > + > + mLog(":\n"); > + char buf[200]; You use buf[100] for a similar case below... 200 is probably overkill here :) @@ +875,5 @@ > + // ReadCFI(smap) .. > + if (!aMappedImage) { > + (void)lul::ReadSymbolData( > + string(aFileName), std::vector<string>(), smap, > + (void*)aRXavma, mLog); What'll happen if this fails? @@ +943,5 @@ > + return res; > +} > + > + > +static TaggedUWord deref_TUW(TaggedUWord aAddr, StackImage* aStackImg) s/deref_TUW/DerefTUW/. @@ +958,5 @@ > + = TaggedUWord(*(uintptr_t*)(aStackImg->mContents + aAddr.value() > + - aStackImg->mStartAvma)); > + return res; > + fail: > + return TaggedUWord(); I'd probably remove the goto and just inline this return statement in the three places. @@ +1073,5 @@ > + mozilla::DebugOnly<int> r = pthread_rwlock_rdlock(&mRWlock); > + MOZ_ASSERT(!r); > + > + pthread_t me = pthread_self(); > + std::map<pthread_t, CFICache*>::iterator it = mCaches.find(me); |iter| is a more typical name for iterators in Moz code, and a better one too, IMO. (Here and elsewhere.) @@ +1076,5 @@ > + pthread_t me = pthread_self(); > + std::map<pthread_t, CFICache*>::iterator it = mCaches.find(me); > + > + if (it == mCaches.end()) { > + r = pthread_rwlock_unlock(&mRWlock); An RAII class to auto-unlock on exit would be really good here, because the function is so large and its control flow is complex. And then you could use the same class elsewhere. @@ +1255,5 @@ > + continue; > + } > + //fprintf(stderr, "%xxx 08lx %02x %02x %02x %02x %02x %02x\n", > + // eip, eipC[-2], eipC[-1], eipC[0], > + // eipC[1], eipC[2], eipC[3]); Remove this, or DEBUG_MAIN it? @@ +1279,5 @@ > + // it's possible to get anywhere by stack-scanning. > + > + // Use stack scanning frugally > + if (n_scanned_frames++ >= aScannedFramesAllowed) > + break; Brace if-bodies. Lots of examples around this code.
Attachment #8378314 - Flags: review?(n.nethercote) → review+
Comment on attachment 8378314 [details] [diff] [review] bug938157-newlib-13d.cset Benoit, requesting review on the changes to the files UnwinderThread2.cpp platform.cpp only. Approximate changes here: * lots of breakpad-glue code has been removed * and a few calls to the LUL interface in LulMain.h have been added * LUL is requested to read the unwind info for all objects in the process when the unwinder thread is started. This is different (and IMO simpler) than the breakpad case, where debuginfo for each object is read incrementally as the unwinder visits each object for the first time. * get_installation_lib_dir and temporarily_map_then_Notify are not new code. They have been copied in from local_debug_info_symbolizer.cc (get_installation_lib_dir and ReadSymbolData_ANDROID) and the old copies are probably redundant now (followup bug to clean up). They also contain a modified of Jim Chen's fix for bug 959931.
Attachment #8378314 - Flags: review?(bgirard)
Comment on attachment 8378314 [details] [diff] [review] bug938157-newlib-13d.cset Review of attachment 8378314 [details] [diff] [review]: ----------------------------------------------------------------- ::: tools/profiler/UnwinderThread2.cpp @@ +206,5 @@ > // Do a store memory barrier. > static void do_MBAR(); > > > +// FORWARDS This isn't a forward declare. Not sure what the point of the comment is. @@ +1429,5 @@ > } > > + > +#if defined(SPS_OS_android) && !defined(MOZ_WIDGET_GONK) > +// A helper function for temporarily_map_then_Notify(), which This whole hunk doesn't belong in the UnwinderThread IMO. @@ +1472,5 @@ > +// as-is to faulty.lib. > +// > +// Hence only (2) requires special-casing here. > +// > +bool temporarily_map_then_Notify(lul::LUL* aLUL, I feel like this is doing two things. Mapping the library in memory and passing it to LUL. Perhaps we should have a more generic 'give me a mapping of this library' instead. @@ +1568,5 @@ > + image = __dl_mmap(hdl, nullptr, sz, 0); > + if (image == MAP_FAILED) { > + dlclose(hdl); > + char buf[300]; > + snprintf(buf, sizeof(buf), Can we use a function or macro for this? Hopefully even reuse one of the ones we already have. Or have mLog handle this? ::: tools/profiler/platform.cpp @@ +360,5 @@ > (int)sProfileEntries); > LOGF("SPS: UnwindStackScan = %d (max dubious frames per unwind).", > (int)sUnwindStackScan); > LOG( "SPS: Use env var MOZ_PROFILER_MODE=help for further information."); > + LOG( "SPS: Note that MOZ_PROFILER_MODE=help sets all values to defaults."); This log is on by default on b2g build. We should make sure it's not compiled for non profiling builds.
Attachment #8378314 - Flags: review?(bgirard) → feedback+
Attached patch bug938157-lul-14d.cset (obsolete) (deleted) — Splinter Review
Addresses all review comments so far, except: * Add comments at the top of each file explaining origins * Add testing facilities * Benoit's comments in comment 33
Attachment #8378314 - Attachment is obsolete: true
(In reply to Nathan Froyd (:froydnj) from comment #26) > Comment on attachment 8378314 [details] [diff] [review] > bug938157-newlib-13d.cset ! @@ +581,5 @@ ! > + // (Prel31ToAddr(&entry->addr)) ! > + // - mapping_addr_ + loading_addr_) & 0x7fffffff ! > + // produces a SVMA. Adding the text_bias_ gives plausible AVMAs. ! > + uint32_t addr = (reinterpret_cast<char*>(Prel31ToAddr(&entry->addr)) ! > + - mapping_addr_ + loading_addr_) & 0x7fffffff; ! ! Maybe pull this out into a helper function, BiasEntryAddress, and use ! it here and below to contain the ugliness? I didn't do this in the end -- wouldn't have made it much shorter. I did however rename |addr| and |next_addr| to |svma| and |next_svma|, which makes the code below it easier to understand. ! @@ +750,5 @@ ! > + } ! > + ! > + RuleSet* Lookup(uintptr_t ia) { ! > + // FIXME: this is suboptimal on ARM/Thumb. Shift right 1 bit ! > + // before modding. The comment isn't correct. I rm'd it. ! @@ +254,5 @@ ! > + // The CFICaches themselves are thread-local and can be both read ! > + // and written when mRWlock is held for reading. It would probably ! > + // be faster to use the pthread_{set,get}specific functions, but ! > + // also more difficult. ! ! "more difficult"? Why is that? On looking at the pthread_ TLS functions (_key_create, _{set,get}specific), maybe it wouldn't be more difficult. It would make this all more pthread_ dependant, though: as it stands the only pthread dependence is reader-writer locks and _create/_join.
(In reply to Nicholas Nethercote [:njn] from comment #29) > Comment on attachment 8378314 [details] [diff] [review] > bug938157-newlib-13d.cset ! @@ +95,5 @@ ! > + res += show_rule(" R15", mR15expr); ! > + res += show_rule(" R7", mR7expr); ! > + res += show_rule(" R11", mR11expr); ! > + res += show_rule(" R12", mR12expr); ! > + res += show_rule(" R13", mR13expr); ! ! Any reason for the non-obvious ordering here? R15 is the program counter/return address, and it's put at the start of the summary line so as to be consistent with printing for the other architectures. ! @@ +343,5 @@ ! > + preen(); ! > + } ! > + ! > + bool getBoundingCodeSegment(/*OUT*/uintptr_t* rx_min, ! > + /*OUT*/uintptr_t* rx_max, uintptr_t addr) { ! ! Why are the outparams first? Because it makes the "visual value flow" through the prototype be right-to-left -- values in on the right, out on the left. This is consistent with how it would be if there were no outparams, in which case the only output of the function is its return type, also on the left. ! @@ +28,5 @@ ! > +// ! > +// * SVMA ("Stated VMA"): this is an address of a symbol (etc) as it ! > +// is stated in the symbol table, or other metadata, of an object. ! > +// Such values are typically small and start from zero or ! > +// thereabouts, unless the object has been prelinked. ! ! Does that mean it's really an offset? Not in any obvious way. At best you might be able to claim that if the object is loaded into memory exactly at its SVMA, then it will not require any relocations, since all the function addresses will be consistent with those that are baked into the object's text segment. ! @@ +42,5 @@ ! > +// number of pages. Once we know the bias for a given object's ! > +// text section (for example), we can compute the AVMAs of all of ! > +// its text symbols by adding the bias to their SVMAs. ! > +// ! > +// * "Image address": typically, to read debuginfo from an object we ! ! Any reason this isn't called the "Image VMA"? I wanted to avoid the use of VMA here since it doesn't play any role as far as standards documents (ELF, Dwarf) are concerned, and they tend to use VMA to denote addresses that have link- or execution-time significance. ! @@ +136,5 @@ ! > +// unwinder thread -- the CFI fast cache. Hence unwinder threads ! > +// first need to register with the library, so their identities are ! > +// known. Also, for maximum effectiveness of the CFI caching, it is ! > +// preferable to have a small number of very-busy unwinder threads ! > +// rather than a large number of mostly-idle unwinder threads. ! ! Is write-starvation an issue? E.g. if we're frequently reading, we ! might not get the write-lock when required. If so, what would the ! consequences be? That is a good question. POSIX says that "Implementations are allowed to favour writers over readers to avoid writer starvation." so perhaps the libpthread implementations we use already favour writers. In this case, writer starvation would result in long delays notifying LUL of new mapped in objects if there was concurrently a lot of unwinding going on. But at least how we are using it, all the notifications (of new objects) and hence the reading of debuginfo, is done before any of the sampling threads start unwinding. So it's not a problem. ! > +private: ! > + // Invalidate the caches. Requires mRWlock to be held for writing; ! > + // does not acquire it itself. ! ! Is this requirement asserted? Worth adding the assertion if it's not, ! and mentioning it here. It would be nice to assert that, but I don't know how to do so using the posix pthreads interface. ! > > + int32_t mOffset; // 32-bit signed offset should be more than enough ! > > + int16_t mReg; // A DW_REG_ value ! > > + uint8_t mHow; // UNKNOWN, NODEREF or DEREF ! > ! > Could these fields be |const|? I tried, but it generates a lot of compile errors.
(In reply to Nathan Froyd (:froydnj) from comment #30) > Comment on attachment 8378314 [details] [diff] [review] > bug938157-newlib-13d.cset ! @@ +1155,5 @@ ! > + last_valid_sp = sp; ! > + } else { ! > + MOZ_ASSERT(last_valid_sp.valid()); ! > + if (sp.valid()) { ! > + if (sp.value() < last_valid_sp.value()) { ! ! Do we care about making this portable to architectures where stacks grow up? No. HPPA? Any others? In any case, if it ever does get ported to those, it will as a result of this test fail very obviously to do any unwinding at all. ! > /* If we're the first thread in, we'll need to allocate the buffer ! ! Um. So if multiple threads can get here, we're allocating multiple ! copies of LUL above. That seems bad. We should protect ! initialization of sLUL under g_spinLock? Or does this comment need ! changing for the new world order? Fixed, to allocate a single LUL instance no matter how many unwinder threads there are. This required keeping track of the number of unwinder threads so as also to be able to deallocate the instance when the last unwinder thread exits.
(In reply to Nicholas Nethercote [:njn] from comment #31) > Comment on attachment 8378314 [details] [diff] [review] > bug938157-newlib-13d.cset ! > + // ReadCFI(smap) .. ! > + if (!aMappedImage) { ! > + (void)lul::ReadSymbolData( ! > + string(aFileName), std::vector<string>(), smap, ! > + (void*)aRXavma, mLog); ! ! What'll happen if this fails? LUL will be aware that an object is mapped for the given address range, but will not have any unwind rules for that range, so unwinding through them will fail. LUL can be made to forget about the range in the "standard" way by a subsequent NotifyBeforeUnmap() call.
Attached patch bug938157-lul-15a.cset (obsolete) (deleted) — Splinter Review
Addresses all remaining review comments. ================================================================= == == njn review: comment 27 ! > These are all copied from toolkit/crashreporter/google-breakpad/src, ! > although none of them are 1:1 copies of files in that subtree. Rather ! > they are assembled from various bits of various files in there. ! ! Can you add comments at the top indicating if they are pure copies or ! copies-with-modifications? Every file that is derived from Breakpad is majorly mashed around, often the result of combining several files and throwing away a lot of unneeded code. Is it worth documenting that in each file individually? Maybe add a short README.LUL explaining the origins? ! > How to test is still open. This patch doesn't contain any testing ! > functionality. ! ! As we discussed, a combination of high-level system smoke tests ! (i.e. does it seem to work, not crash, etc) plus some low-level unit ! tests is probably the best you can easily do. Starting the profiler with MOZ_PROFILER_LUL_TEST set to anything runs a short self-test at profiler startup. This creates a number of call sequences through a group of 8 test functions, unwinds from the innermost point, and checks that the unwind sequence is as expected. This will test the ELF, Dwarf and Exidx parsers, the main unwind loop and supporting logic -- that is, most of LUL. The one part this doesn't test is stack scanning, but that's somewhat peripheral. A short test-result summary is printed to the profiler's logging sink, each line prefixed with "LULUnitTest" so as to make it greppable. Currently it does not cause Firefox to then exit. Currently passes self-test on x86-linux and x64-linux. Fails on arm-android but the profiler actually works, so I assume it's something minor. Will fix before landing. ================================================================= == == benwa review: comment 33 ! > +#if defined(SPS_OS_android) && !defined(MOZ_WIDGET_GONK) ! > +// A helper function for temporarily_map_then_Notify(), which ! ! This whole hunk doesn't belong in the UnwinderThread IMO. Per discussion on irc, we can move this to a followup. ! @@ +1472,5 @@ ! > +// as-is to faulty.lib. ! > +// ! > +// Hence only (2) requires special-casing here. ! > +// ! > +bool temporarily_map_then_Notify(lul::LUL* aLUL, ! ! I feel like this is doing two things. Mapping the library in memory ! and passing it to LUL. Perhaps we should have a more generic 'give me ! a mapping of this library' instead. Done. 'class ObjectMapper' and its child classes. This turned out to be quite tricky, but is use infrastructure if in the future we add support for separate debuginfo files, so I persevered. ! ::: tools/profiler/platform.cpp ! @@ +360,5 @@ ! > (int)sProfileEntries); ! > LOGF("SPS: UnwindStackScan = %d (max dubious frames per unwind).", ! > (int)sUnwindStackScan); ! > LOG( "SPS: Use env var MOZ_PROFILER_MODE=help for further information."); ! > + LOG( "SPS: Note that MOZ_PROFILER_MODE=help sets all values to defaults."); ! ! This log is on by default on b2g build. We should make sure it's not ! compiled for non profiling builds. I guarded the Android implementation of LOG/LOGF using moz_profiler_verbose(), like the desktop Linux ones are. As a result it falls completely silent unless the profiler is run with MOZ_PROFILER_VERBOSE set. Is that an acceptable result?
Attachment #8387191 - Attachment is obsolete: true
Attachment #8389439 - Attachment is patch: true
Comment on attachment 8389439 [details] [diff] [review] bug938157-lul-15a.cset Nathan, Benoit, requesting review of the same sections that you respectively looked at before.
Attachment #8389439 - Flags: review?(nfroyd)
Attachment #8389439 - Flags: review?(bgirard)
> Every file that is derived from Breakpad is majorly mashed around, > often the result of combining several files and throwing away a lot of > unneeded code. Is it worth documenting that in each file > individually? Maybe add a short README.LUL explaining the origins? I think documenting each file individually would be good. It can be a short sentence, e.g. "This file contains modified versions of the following files from Breakpad: a.cpp, b.cpp, c.cpp."
Comment on attachment 8389439 [details] [diff] [review] bug938157-lul-15a.cset Review of attachment 8389439 [details] [diff] [review]: ----------------------------------------------------------------- Looks good overall. I think the code could be clearer however. r- because I'd like to have another look. ::: tools/profiler/UnwinderThread2.cpp @@ +411,5 @@ > /* If so, here is the register state and stack. Unset if > .haveNativeInfo is false. */ > + lul::UnwindRegs startRegs; > + lul::StackImage stackImg; > + void* stackMaxSafe; /* VMA for max safe stack reading */ Virtual Memory Address? Wouldn't 'address' be more clear? Everything in user land is VMA. @@ +1446,5 @@ > > buff->aProfile->EndUnwind(); > } > > +////////////////////////////////////////////////////////// I feel like the section in this file are an indication that this should be split into a different file. Particularly the ObjectMapper. This file has 2k lines of code and significant ifdef. As well it would be nice to split it out into UnwinderThread_stub.cpp for non supported platform and Unwinderthread_linux.cpp for support platforms. This will cut the size of the file quite a bit and reduces the ifdef complexity. @@ +1449,5 @@ > > +////////////////////////////////////////////////////////// > +//// BEGIN ObjectMapper > + > +// A (nearly-) RAII class that abstracts away the differences between You might want to disable new/delete then. See JSCustomObjectBuilder.h @@ +1512,5 @@ > + MOZ_ASSERT(mSize == 0); > + return; > + } > + MOZ_ASSERT(mSize > 0); > + // The following assertion doesn't necessarily have to be true, I do believe it -is- against the spec to map something in the first page of memory. FYI It's also against the spec to have an allocation return something that is within sizeof(t) of the last byte(s) of memory so that we never overflow when iterating with pointers. @@ +1557,5 @@ > + } > + > +private: > + // Are we currently holding a mapped object? > + bool mIsMapped; Look like all these fields are shared by both implementation. Is there a good reason not to share them in the base class? @@ +1564,5 @@ > + void* mImage; > + size_t mSize; > + > + // A logging sink, for complaining about mapping failures. > + void (*mLog)(const char*); I don't really understand why we need complex logging when for now I believe we're just logging everything to our logging macro. @@ +1597,5 @@ > +// the installation's lib directory is, since we'll have to look in > +// there to get hold of libmozglue.so. Returned C string is heap > +// allocated and the caller must deallocate it. > +static char* > +get_installation_lib_dir ( void ) This comment doesn't apply to this. Lets move it above the comment. @@ +1601,5 @@ > +get_installation_lib_dir ( void ) > +{ > + nsCOMPtr<nsIProperties> > + directoryService(do_GetService(NS_DIRECTORY_SERVICE_CONTRACTID)); > + if (!directoryService) return nullptr; Please avoid single line if statement. braced if are preferred. This is all over the patch unfortunately :(. @@ +1640,5 @@ > + if (mHdl) { > + __dl_munmap(mHdl, mImage, mSize); > + dlclose(mHdl); > + } else { > + // If mHdl is not set, the problem has been handed to IMO this comment and else block doesn't belong. This isn't an implementation detail but it's a comment about how the outside code uses this. If we don't have something to unmap then it's perfectly fine to do nothing here. @@ +1780,5 @@ > + aLUL->NotifyExecutableArea(lib.GetStart(), lib.GetEnd()-lib.GetStart()); > + } > + > + // |mapper| goes out of scope at this point and so its destructor > + // unmaps the object. Normally RAII doesn't need a comment but if you think it's more clear this way that's ok. Normally putting RAII or Auto in the class name make it more obvious here that this is going to happen. But keeping it as-is is fine. @@ +1856,5 @@ > + // read_procmaps has read unwind information into sLUL, so that the > + // tests have something to unwind against. Without that they'd be > + // pretty meaningless. > + if (doLulTest) { > + RunLulUnitTests(sLUL); You should use GTest for this. All you need is to add a simple test file that has a wrapper and the setup/teardown required: https://developer.mozilla.org/en-US/docs/GTest There's some example in gfx/test/gtest @@ +2019,5 @@ > + MOZ_ASSERT(sLULcount > 0); > + if (sLULcount == 1) { > + // Tell the library to discard unwind info for the entire address > + // space. > + sLUL->NotifyBeforeUnmap(0, UINTPTR_MAX); This feels a bit like a trick. If you're going to use a hack like this can you make it an implementation detail of sLUL instead? Create a header inline method: void ClearMappings(better name?)() { NotifyBeforeUnmap(0, UINTPTR_MAX); } This way if NotifyBeforeUnmap(0, UINTPTR_MAX); is no longer a good way to clear everything we don't have the change the calles. ::: tools/profiler/platform.h @@ +52,5 @@ > #endif > > #define ASSERT(a) MOZ_ASSERT(a) > > +extern bool moz_profiler_verbose(); Does this really need 'extern'. Dropping the keyword should give a forward declaration.
Attachment #8389439 - Flags: review?(bgirard) → review-
Comment on attachment 8389439 [details] [diff] [review] bug938157-lul-15a.cset Review of attachment 8389439 [details] [diff] [review]: ----------------------------------------------------------------- I agree with most of BenWa's review comments. ::: tools/profiler/UnwinderThread2.cpp @@ +1449,5 @@ > > +////////////////////////////////////////////////////////// > +//// BEGIN ObjectMapper > + > +// A (nearly-) RAII class that abstracts away the differences between Or just |class MOZ_STACK_CLASS ObjectMapper|. @@ +1512,5 @@ > + MOZ_ASSERT(mSize == 0); > + return; > + } > + MOZ_ASSERT(mSize > 0); > + // The following assertion doesn't necessarily have to be true, Both of BenWa's statements are incorrect.</pedantic> @@ +1844,5 @@ > + // thread in. > + MOZ_ASSERT(sLULcount > 0); > + sLULcount++; > + // Register this thread so it can do unwinding. > + sLUL->RegisterUnwinderThread(); Might be slightly clearer to move the increment and RegisterUnwinderThread outside of the if (but still inside the mutex lock) so that both branches can share code. Unless read_procmaps needs a thread registered for unwinding already?
Attachment #8389439 - Flags: review?(nfroyd) → review+
Attached patch bug938157-lul-16e.cset (deleted) — Splinter Review
Addresses all remaining review comments. ========================================================== == == benwa comment 42 ! @@ +1446,5 @@ ! > ! > buff->aProfile->EndUnwind(); ! > } ! > ! > +////////////////////////////////////////////////////////// ! ! I feel like the section in this file are an indication that this ! should be split into a different file. Particularly the ! ObjectMapper. This file has 2k lines of code and significant ifdef. AutoObjectMapper got moved to AutoObjectMapper.{cpp,h}. ! @@ +1512,5 @@ ! > + MOZ_ASSERT(mSize == 0); ! > + return; ! > + } ! > + MOZ_ASSERT(mSize > 0); ! > + // The following assertion doesn't necessarily have to be true, ! ! I do believe it -is- against the spec to map something in the first ! page of memory. FYI It's also against the spec to have an allocation ! return something that is within sizeof(t) of the last byte(s) of ! memory so that we never overflow when iterating with pointers. POSIX appears to say that a map of length zero is not allowed. But it doesn't disallow maps at address zero. Indeed I have seen the MacOS mmap implementation return zero on occasion. ! @@ +1564,5 @@ ! > + void* mImage; ! > + size_t mSize; ! > + ! > + // A logging sink, for complaining about mapping failures. !> + void (*mLog)(const char*); ! ! I don't really understand why we need complex logging when for now I ! believe we're just logging everything to our logging macro. This helps decouple LUL from its surroundings and makes it easier to test. For example there are different logging sinks for "normal" operation and when it is run by gtest. That said, it might be good to tidy up the logging as a followup. ! @@ +1601,5 @@ ! > +get_installation_lib_dir ( void ) ! > +{ ! > + nsCOMPtr<nsIProperties> ! > + directoryService(do_GetService(NS_DIRECTORY_SERVICE_CONTRACTID)); ! > + if (!directoryService) return nullptr; ! ! Please avoid single line if statement. braced if are preferred. This ! is all over the patch unfortunately :(. Fixed for this function and the other new stuff in UnwinderThread2.cpp. A lot of the patch looks like it doesn't comply with the house style because it contains code copied from toolkit/crashreporter/google-breakpad, which has a different style. The njn/froydnj consensus seems to be not to mess with the style of copied code.
Attachment #8389439 - Attachment is obsolete: true
Attachment #8396405 - Flags: review?(bgirard)
Attachment #8396405 - Flags: review?(bgirard) → review+
Status: NEW → RESOLVED
Closed: 11 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla31
\o/ Is it on by default in some configurations?
Attached file error.txt (deleted) —
FYI, this is causing build errors when --enable-warnings-as-errors is enabled (see attachment).
(In reply to Frédéric Wang (:fredw) from comment #48) > FYI, this is causing build errors when --enable-warnings-as-errors is > enabled (see attachment). Yes. This is being tracked by bug 997700. The fix should be on m-i within the next two hours.
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: