Closed
Bug 1350500
Opened 8 years ago
Closed 7 years ago
The sharedLibraries returned by nsIProfiler have empty breakpadId fields on Android
Categories
(Core :: Gecko Profiler, enhancement, P1)
Tracking
()
RESOLVED
FIXED
mozilla61
Tracking | Status | |
---|---|---|
firefox61 | --- | fixed |
People
(Reporter: mstange, Assigned: m_kato)
References
Details
Attachments
(3 files, 4 obsolete files)
(deleted),
text/x-review-board-request
|
glandium
:
review+
|
Details |
(deleted),
patch
|
Details | Diff | Splinter Review | |
(deleted),
patch
|
glandium
:
review+
|
Details | Diff | Splinter Review |
This is going to make it hard to find the right symbol table for them.
Reporter | ||
Comment 1•8 years ago
|
||
We try to obtain the breakpadId with the function getId at http://searchfox.org/mozilla-central/rev/7419b368156a6efa24777b21b0e5706be89a9c2f/tools/profiler/core/shared-libraries-linux.cc#26 by mapping the file at the supplied path into memory and parsing the elf structure.
I haven't debugged this, but here are my guesses about what causes this:
For system libraries, we're not going to find the file because we don't have its full path due to bug 1350501.
For Firefox libraries, either we're not able to open a path of the form .../base.apk!/.../libxul.so, or we're running into the fact that these libraries are compressed using xz and so breakpad can't parse them.
glandium, any ideas for what we could do about this? Are there other ways to obtain the breakpadId?
Flags: needinfo?(mh+mozilla)
Reporter | ||
Comment 2•8 years ago
|
||
Having something like ElfLoader::GetMappableFromPath available and somehow being able to pass the result into breakpad's ElfFileIdentifierFromMappedFile would be nice.
Reporter | ||
Comment 3•8 years ago
|
||
I can confirm that fixing bug 1350501 gives us breakpadIds for system libraries. So now we just need to handle the zip path / compressed binary case.
Reporter | ||
Comment 4•8 years ago
|
||
Casting dl_info->dlpi_addr to const void* and passing that to FileID::ElfFileIdentifierFromMappedFile works for the firefox libraries, but not for most system libraries. So in combination with bug 1350501 I think I now have everything I need. This seems like a hack though.
Reporter | ||
Comment 5•8 years ago
|
||
(In reply to Markus Stange [:mstange] from comment #4)
> but not for most system libraries.
I don't understand why, though. The first bytes seem to be as expected, for example /system/lib/libc.so (loaded at 0xb6dfe000) starts with 7f454c4601010100, the 454c46 being the letters ELF.
Comment 6•8 years ago
|
||
(In reply to Markus Stange [:mstange] from comment #5)
> (In reply to Markus Stange [:mstange] from comment #4)
> > but not for most system libraries.
>
> I don't understand why, though. The first bytes seem to be as expected, for
> example /system/lib/libc.so (loaded at 0xb6dfe000) starts with
> 7f454c4601010100, the 454c46 being the letters ELF.
Can you produce a memory dump at that address, for, say, 1KB?
(In reply to Markus Stange [:mstange] from comment #2)
> Having something like ElfLoader::GetMappableFromPath available and somehow
> being able to pass the result into breakpad's
> ElfFileIdentifierFromMappedFile would be nice.
That already exists, and in fact, the profiler uses that to access the eh_frame data (or used to): __dl_mmap/__dl_munmap. In fact, that function works for system libraries too, as long as they are in /system/lib.
Flags: needinfo?(mh+mozilla)
Comment 7•8 years ago
|
||
(In reply to Mike Hommey [:glandium] from comment #6)
> eh_frame data (or used to)
seems it still does.
Reporter | ||
Comment 8•8 years ago
|
||
Interesting! I'll take a look. Thanks!
So if the library start address returned by AutoObjectMapperFaultyLib::Map is an address that FileID::ElfFileIdentifierFromMappedFile can successfully dump a breakpadId from, then that means that for those libraries I don't need an absolute path *in order to compute the breakpadId*. However, I'm still interested in the absolute path of these files regardless. And as far as I can tell, __dl_mmap can't really help me with that problem.
But thanks for pointing me to AutoObjectMapper; at least I'll be able to re-use the code around get_installation_lib_dir() in order to get the correct absolute path for libmozglue.so, which was the last piece I needed in bug 1350501.
Comment hidden (mozreview-request) |
Reporter | ||
Updated•8 years ago
|
Assignee: nobody → mstange
Status: NEW → ASSIGNED
Comment 10•8 years ago
|
||
mozreview-review |
Comment on attachment 8862605 [details]
Bug 1350500 - Compute the breakpadId for Firefox libraries on Android by mapping them into memory.
https://reviewboard.mozilla.org/r/134456/#review140412
::: tools/profiler/core/shared-libraries-linux.cc:56
(Diff revision 1)
> # include <sys/types.h>
>
> #elif defined(GP_OS_android) && !defined(MOZ_WIDGET_GONK)
> # define CONFIG_CASE_2 1
> # include "ElfLoader.h" // dl_phdr_info
> +# include "../lul/AutoObjectMapper.h"
Why not add the lul directory to LOCAL_INCLUDES in tools/profiler/moz.build?
::: tools/profiler/core/shared-libraries-linux.cc:73
(Diff revision 1)
> +#if defined(USE_FAULTY_LIB)
> +static void DontLog(const char* aMsg) {}
> +#endif
I'd say it would be better to make AutoObjectMapperPOSIX allow aLog to be NULL, and avoid failedToMessage doing sprintfs altogether in that case.
::: tools/profiler/core/shared-libraries-linux.cc:99
(Diff revision 1)
> PageAllocator allocator;
> auto_wasteful_vector<uint8_t, sizeof(MDGUID)> identifier(&allocator);
Seems like you could move that before the USE_FAULTY_LIB section and remove the duplicate code from that section.
Attachment #8862605 -
Flags: review?(mh+mozilla)
Comment hidden (mozreview-request) |
Comment hidden (mozreview-request) |
Comment 13•8 years ago
|
||
mozreview-review |
Comment on attachment 8862605 [details]
Bug 1350500 - Compute the breakpadId for Firefox libraries on Android by mapping them into memory.
https://reviewboard.mozilla.org/r/134456/#review143370
::: tools/profiler/core/shared-libraries-linux.cc:59
(Diff revision 2)
> + AutoObjectMapperFaultyLib mapper(nullptr);
> + void* image = nullptr;
> + size_t size = 0;
> + if (mapper.Map(&image, &size, bin_name) && image && size) {
> + if (FileID::ElfFileIdentifierFromMappedFile(image, identifier)) {
> + return FileID::ConvertIdentifierToUUIDString(identifier) + "0";
I'd feel better if there was only one line doing this. But it wouldn't be much readable, except using a goto. meh.
Attachment #8862605 -
Flags: review?(mh+mozilla) → review+
Comment 14•7 years ago
|
||
(In reply to Markus Stange [:mstange] from comment #11)
> Created attachment 8866517 [details]
> Bug 1350500 - Allow passing nullptr for aLog in AutoObjectMapperPOSIX.
[per comments on irc] my concern about this is that it could hide useful
diagnostic info in case of map failure. I'd be happier if it could be
redone so as to cause any failure messages to be sent to stderr, at least.
Updated•7 years ago
|
Attachment #8866517 -
Flags: review?(jseward)
Assignee | ||
Comment 15•7 years ago
|
||
Markus, do you have a time to handle this bug? If no time, I can take this (this bug is very important for remote profiling on Android).
Flags: needinfo?(mstange)
Assignee | ||
Comment 16•7 years ago
|
||
attached fix will cause dead lock since dl_iterate_phdr seems be have the mutex and dlopen tries to get the mutex too.
Reporter | ||
Comment 17•7 years ago
|
||
I do not have time to handle this bug, sorry. I'm assigning it to you.
Assignee: mstange → m_kato
Flags: needinfo?(mstange)
Assignee | ||
Comment 18•7 years ago
|
||
Assignee | ||
Comment 19•7 years ago
|
||
Assignee | ||
Updated•7 years ago
|
Attachment #8866517 -
Attachment is obsolete: true
Assignee | ||
Comment 20•7 years ago
|
||
Comment on attachment 8963477 [details] [diff] [review]
Always build AutoObjectMapper on Android
AutoObjectMapper doesn't build on Android/aarch64. For part 1. fix, we should always build it on Android/aarch64 too.
Attachment #8963477 -
Flags: review?(mstange)
Assignee | ||
Comment 21•7 years ago
|
||
Assignee | ||
Comment 22•7 years ago
|
||
Comment on attachment 8963478 [details] [diff] [review]
Elfloader uses read/write lock instead of mutex
When callback of dl_iterate_phdr uses dlopen (part 1 fix uses it), dlopen causes deadlock since __wrap_dl_iterate_phdr already acquires mutex. So I would like to replace mutex of ElfLoader with read/write lock.
Attachment #8963478 -
Flags: review?(mh+mozilla)
Reporter | ||
Updated•7 years ago
|
Attachment #8963477 -
Flags: review?(mstange) → review?(mh+mozilla)
Updated•7 years ago
|
Priority: -- → P1
Comment 23•7 years ago
|
||
Comment on attachment 8963477 [details] [diff] [review]
Always build AutoObjectMapper on Android
Review of attachment 8963477 [details] [diff] [review]:
-----------------------------------------------------------------
I guess you want that for arm64, which is not covered by the if clause, but is that really going to be useful without the rest of lul?
Attachment #8963477 -
Flags: review?(mh+mozilla)
Comment 24•7 years ago
|
||
Comment on attachment 8963478 [details] [diff] [review]
Elfloader uses read/write lock instead of mutex
Review of attachment 8963478 [details] [diff] [review]:
-----------------------------------------------------------------
Without more information about what deadlock is happening how, I can't tell much, but it would seem you want a reentrant lock (PTHREAD_MUTEX_RECURSIVE), not a read-write one.
Attachment #8963478 -
Flags: review?(mh+mozilla)
Assignee | ||
Comment 25•7 years ago
|
||
Use PTHREAD_MUTEX_RECURSIVE to avoid deadlock by dlopen.
Attachment #8963478 -
Attachment is obsolete: true
Attachment #8963834 -
Flags: review?(mh+mozilla)
Assignee | ||
Comment 26•7 years ago
|
||
(In reply to Mike Hommey [:glandium] from comment #23)
> Comment on attachment 8963477 [details] [diff] [review]
> Always build AutoObjectMapper on Android
>
> Review of attachment 8963477 [details] [diff] [review]:
> -----------------------------------------------------------------
>
> I guess you want that for arm64, which is not covered by the if clause, but
> is that really going to be useful without the rest of lul?
Yes, this is for Android/aarch64. aarch64 profiler support is unstable and lul doesn't implement aarch64 code now.
If I should turn off profiler on Android/aarch64 as default, I will update a patch. How do you think?
Assignee | ||
Comment 27•7 years ago
|
||
by bug 1360322, Android/aarch64 turns on profiler without lul...
Comment 28•7 years ago
|
||
(In reply to Mike Hommey [:glandium] from comment #24)
> Comment on attachment 8963478 [details] [diff] [review]
> Elfloader uses read/write lock instead of mutex
>
> Review of attachment 8963478 [details] [diff] [review]:
> -----------------------------------------------------------------
>
> Without more information about what deadlock is happening how, I can't tell
> much, but it would seem you want a reentrant lock (PTHREAD_MUTEX_RECURSIVE),
> not a read-write one.
Okay, I had missed this sentence "When callback of dl_iterate_phdr uses dlopen". Under those conditions, a recursive lock is actually dangerous. A read-write one would work better, but that would be a significant difference with the system linker, which doesn't guarantee you can call dlopen while dl_iterate_phdr'ing. How about filling an array with dl_iterate_phdr and then iterate that array and call dlopen?
Updated•7 years ago
|
Attachment #8963834 -
Flags: review?(mh+mozilla)
Assignee | ||
Comment 29•7 years ago
|
||
(In reply to Mike Hommey [:glandium] from comment #28)
> (In reply to Mike Hommey [:glandium] from comment #24)
> > Comment on attachment 8963478 [details] [diff] [review]
> > Elfloader uses read/write lock instead of mutex
> >
> > Review of attachment 8963478 [details] [diff] [review]:
> > -----------------------------------------------------------------
> >
> > Without more information about what deadlock is happening how, I can't tell
> > much, but it would seem you want a reentrant lock (PTHREAD_MUTEX_RECURSIVE),
> > not a read-write one.
>
> Okay, I had missed this sentence "When callback of dl_iterate_phdr uses
> dlopen". Under those conditions, a recursive lock is actually dangerous. A
> read-write one would work better, but that would be a significant difference
> with the system linker, which doesn't guarantee you can call dlopen while
> dl_iterate_phdr'ing. How about filling an array with dl_iterate_phdr and
> then iterate that array and call dlopen?
Hmm, if using copied array, for Android 4.4 suppoort, we need to create LibraryHandleList (for 5.0+) and DebuggingHelper's array list (for 4.1-4.3). Original part 1 fix uses dlopen to get ElfHeader, but callback of dl_iterate_phdr have base address of image. So I should rewrite markus's patch...
Assignee | ||
Updated•7 years ago
|
Attachment #8963477 -
Attachment is obsolete: true
Assignee | ||
Updated•7 years ago
|
Attachment #8963834 -
Attachment is obsolete: true
Assignee | ||
Comment 30•7 years ago
|
||
Update version of previous markus's fix without dlopen.
dlopen has mutex for custom linker, calling dlopen from callback of dl_iterate_phdr causes dead lock. When using custom elf linker, module is already loaded into memory, so we can use ElfFileIdentifierFromMappedFile directly from base address.
Attachment #8965225 -
Flags: review?(mh+mozilla)
Assignee | ||
Comment 31•7 years ago
|
||
(if r-, I will create lock-free version dl_iterate_phdr.)
Comment 32•7 years ago
|
||
Comment on attachment 8965225 [details] [diff] [review]
Compute the breakpadId for Firefox libraries on Android by mapping them into memory
Review of attachment 8965225 [details] [diff] [review]:
-----------------------------------------------------------------
::: tools/profiler/core/shared-libraries-linux.cc
@@ +39,5 @@
> # error "Unexpected configuration"
> #endif
>
> +// Get the breakpad Id for the binary file pointed by bin name and base address
> +static std::string getId(const char *bin_name, unsigned long base)
libStart, like everywhere else?
@@ +49,5 @@
> auto_wasteful_vector<uint8_t, sizeof(MDGUID)> identifier(&allocator);
>
> +#if defined(GP_OS_android)
> + if (nsDependentCString(bin_name).Find("!/") != kNotFound) {
> + // This moudule is in APK and compressed. Since this is loaded into memory,
Typo: module.
@@ +51,5 @@
> +#if defined(GP_OS_android)
> + if (nsDependentCString(bin_name).Find("!/") != kNotFound) {
> + // This moudule is in APK and compressed. Since this is loaded into memory,
> + // use it instead of loading again
> + if (FileID::ElfFileIdentifierFromMappedFile((void*)base, identifier)) {
So the problem here is that while it might work now, there is no actual guarantee that ld is forever going to produce binaries where what we find a base points the the elf headers. As a matter of fact, iirc, lld breaks this assumption.
Attachment #8965225 -
Flags: review?(mh+mozilla)
Comment 33•7 years ago
|
||
(In reply to Makoto Kato [:m_kato] from comment #31)
> (if r-, I will create lock-free version dl_iterate_phdr.)
You can't really do that, as long as the system linker is involved.
Why not just change dl_iterate_callback to collect dlpi_name and maybe other information, then change the loop that follows the dl_iterate_phdr call to fill SharedLibraryInfo?
Assignee | ||
Comment 34•7 years ago
|
||
Attachment #8965225 -
Attachment is obsolete: true
Assignee | ||
Comment 35•7 years ago
|
||
Comment on attachment 8967299 [details] [diff] [review]
Part 2. Don't call dlopen on callback
Since aarch64 doesn't implement native stack walker (bug 1450185), there is no AutoObjectMapperFaultyLib on Android/aarch64.
Attachment #8967299 -
Flags: review?(mh+mozilla)
Comment 36•7 years ago
|
||
Comment on attachment 8967299 [details] [diff] [review]
Part 2. Don't call dlopen on callback
Review of attachment 8967299 [details] [diff] [review]:
-----------------------------------------------------------------
::: tools/profiler/core/shared-libraries-linux.cc
@@ +31,5 @@
> #if defined(GP_OS_linux)
> # include <link.h> // dl_phdr_info
> #elif defined(GP_OS_android)
> +#if defined(GP_PLAT_arm_android) || defined(GP_PLAT_x86_android)
> +// Android/aarch64 doesn't implement native stack walker yet.
If you're adding that because the previous patch breaks the build for aarch64, please fix the previous patch instead. If the build is otherwise not broken, please move this to a separate bug.
@@ +66,5 @@
>
> PageAllocator allocator;
> auto_wasteful_vector<uint8_t, sizeof(MDGUID)> identifier(&allocator);
>
> +#if defined(GP_PLAT_arm_android) || defined(GP_PLAT_x86_android)
same here
@@ +110,5 @@
>
> static int
> dl_iterate_callback(struct dl_phdr_info *dl_info, size_t size, void *data)
> {
> + nsTArray<LoadedLibraryInfo>* libInfoList =
this is one of the cases where using auto is fine.
Attachment #8967299 -
Flags: review?(mh+mozilla) → review+
Assignee | ||
Comment 37•7 years ago
|
||
(In reply to Mike Hommey [:glandium] from comment #36)
> Comment on attachment 8967299 [details] [diff] [review]
> Part 2. Don't call dlopen on callback
>
> Review of attachment 8967299 [details] [diff] [review]:
> -----------------------------------------------------------------
>
> ::: tools/profiler/core/shared-libraries-linux.cc
> @@ +31,5 @@
> > #if defined(GP_OS_linux)
> > # include <link.h> // dl_phdr_info
> > #elif defined(GP_OS_android)
> > +#if defined(GP_PLAT_arm_android) || defined(GP_PLAT_x86_android)
> > +// Android/aarch64 doesn't implement native stack walker yet.
>
> If you're adding that because the previous patch breaks the build for
> aarch64, please fix the previous patch instead. If the build is otherwise
> not broken, please move this to a separate bug.
So I added https://bugzilla.mozilla.org/attachment.cgi?id=8963477 at first time due to build break on android/aarch64.
Assignee | ||
Comment 38•7 years ago
|
||
Comment on attachment 8963477 [details] [diff] [review]
Always build AutoObjectMapper on Android
AutoObjectMapper works even if on aarch64 and this requires on all android by part 1.
Attachment #8963477 -
Attachment is obsolete: false
Attachment #8963477 -
Flags: review?(mh+mozilla)
Assignee | ||
Comment 39•7 years ago
|
||
If bug 1450185 is landed, all android has native stack walker, so comment #38's patch is unnecessary.
Assignee | ||
Comment 40•7 years ago
|
||
Comment on attachment 8963477 [details] [diff] [review]
Always build AutoObjectMapper on Android
cancel because I will be waiting for bug 1450185
Attachment #8963477 -
Flags: review?(mh+mozilla)
Comment 41•7 years ago
|
||
Pushed by m_kato@ga2.so-net.ne.jp:
https://hg.mozilla.org/integration/mozilla-inbound/rev/81e24b24a789
Compute the breakpadId for Firefox libraries on Android by mapping them into memory. r=glandium
https://hg.mozilla.org/integration/mozilla-inbound/rev/cdd70d25704e
Don't call dlopen on callback. r=glandium
Comment 42•7 years ago
|
||
bugherder |
https://hg.mozilla.org/mozilla-central/rev/81e24b24a789
https://hg.mozilla.org/mozilla-central/rev/cdd70d25704e
Status: ASSIGNED → RESOLVED
Closed: 7 years ago
status-firefox61:
--- → fixed
Resolution: --- → FIXED
Target Milestone: --- → mozilla61
Reporter | ||
Comment 43•7 years ago
|
||
Thanks for driving this to the finish line! I just got a profile from today's Firefox Nightly for Android and it contained the expected breakpadIds.
Unfortunately, I didn't get any symbols in the profile. I think this is caused by a bug in the symbol API - the necessary symbols are on the server, but the API isn't handing them to us. I've filed bug 1457706 about this.
Updated•6 years ago
|
status-firefox55:
affected → ---
You need to log in
before you can comment on or make changes to this bug.
Description
•