Closed Bug 922548 Opened 11 years ago Closed 7 years ago

[Flatfish][Profiling]: Fails to analyze the parsed file for Gecko Profiling tool.

Categories

(Firefox OS Graveyard :: Performance, defect, P4)

ARM
Gonk (Firefox OS)
defect

Tracking

(tracking-b2g:-)

RESOLVED WONTFIX
tracking-b2g -

People

(Reporter: vliu, Unassigned)

References

Details

(Keywords: perf, Whiteboard: [c=profiling p= s= u=flatfish] [flatfish][TCP=breakage])

Attachments

(4 files)

This bug is issued because we failed to parse Gecko Profiling tool on Flatfish. The profiling tool is based on the below link. http://bit.ly/15GO4Y8
Attached file profile_2180_b2g.txt (deleted) —
By running the profile.sh, we got the below messages. profile_2180_b2g.txt is attached to debug. vincentliu@vincentliu-BM6660-BM6360:~/projs/a31-422$ ./profile.sh start Stopping b2g ..... stopped. Removing old profile files (from device) ... done Starting b2g with profiling enabled ... started vincentliu@vincentliu-BM6660-BM6360:~/projs/a31-422$ ./profile.sh capture Signaling Profiled Processes: 2180 Stabilizing 2180 b2g ... Pulling /data/local/tmp/profile_0_2180.txt into profile_2180_b2g.txt Adding symbols to profile_2180_b2g.txt and creating profile_2180_b2g.sym ... Removing old profile files (from device) ... done vincentliu@vincentliu-BM6660-BM6360:~/projs/a31-422$ ./profile.sh capture Signaling Profiled Processes: 2180 2338 Stabilizing 2180 b2g ... Pulling /data/local/tmp/profile_0_2180.txt into profile_2180_b2g.txt Adding symbols to profile_2180_b2g.txt and creating profile_2180_b2g.sym ... Traceback (most recent call last): File "./scripts/profile-symbolicate.py", line 331, in <module> main() File "./scripts/profile-symbolicate.py", line 296, in main profile = json.load(open(args.filename, "rb")) File "/usr/lib/python2.7/json/__init__.py", line 278, in load **kw) File "/usr/lib/python2.7/json/__init__.py", line 326, in loads return _default_decoder.decode(s) File "/usr/lib/python2.7/json/decoder.py", line 366, in decode obj, end = self.raw_decode(s, idx=_w(s, 0).end()) File "/usr/lib/python2.7/json/decoder.py", line 382, in raw_decode obj, end = self.scan_once(s, idx) ValueError: Expecting , delimiter: line 1 column 376832 (char 376832) Stabilizing 2338 (Preallocated a ... Pulling /data/local/tmp/profile_2_2338.txt into profile_2338_(Preallocated a.txt Adding symbols to profile_2338_(Preallocated a.txt and creating profile_2338_(Preallocated a.sym ... Traceback (most recent call last): File "./scripts/profile-symbolicate.py", line 331, in <module> main() File "./scripts/profile-symbolicate.py", line 296, in main profile = json.load(open(args.filename, "rb")) File "/usr/lib/python2.7/json/__init__.py", line 278, in load **kw) File "/usr/lib/python2.7/json/__init__.py", line 326, in loads return _default_decoder.decode(s) File "/usr/lib/python2.7/json/decoder.py", line 366, in decode obj, end = self.raw_decode(s, idx=_w(s, 0).end()) File "/usr/lib/python2.7/json/decoder.py", line 382, in raw_decode obj, end = self.scan_once(s, idx) ValueError: Expecting , delimiter: line 1 column 968719 (char 968719) Merging profile: profile_2180_b2g.sym profile_2338_(Preallocated a.sym Traceback (most recent call last): File "./gecko/tools/profiler/merge-profiles.py", line 79, in <module> MergeProfiles(sys.argv[1:]) File "./gecko/tools/profiler/merge-profiles.py", line 23, in MergeProfiles raise Exception("Filename '" + fname + "' doesn't match expected pattern") Exception: Filename 'profile_2338_(Preallocated' doesn't match expected pattern Results: profile_captured.sym Removing old profile files (from device) ... done
Near 376832 byte of profile_2180_b2g.txt, I found several bytes of garbage data.
Flags: needinfo?(bgirard)
My bet is the custom UTF-8 code is busted. Did you have any custom profiler label/marker?
Flags: needinfo?(bgirard)
Attached file profiling_data.zip (deleted) —
Put more profiling data for analysis.
(In reply to Benoit Girard (:BenWa) from comment #3) > My bet is the custom UTF-8 code is busted. Did you have any custom profiler > label/marker? I made some experiment with the device and found that the corrupted bytes can appear in quite random places in the profile data. In one capture we even saw the label nsAppShell::ProcessNextNativeEvent being corrupted. I also made a check against the string before it was written out to the file so we can rule out storage hardware problems. This looks like more general corruption of buffer.
according triage result, mark it koi+
Blocks: flatfish
blocking-b2g: --- → koi+
Whiteboard: [Flatfish only]
Can you run this under valgrind?
Blocks: 920921
No longer blocks: flatfish
(In reply to Andreas Gal :gal from comment #7) > Can you run this under valgrind? Yes, I will try to run it under valgrind.
Is it possible the JS scripts contain non ASCII characters?
(In reply to Vincent Liu[:vliu] from comment #8) > (In reply to Andreas Gal :gal from comment #7) > > Can you run this under valgrind? > > Yes, I will try to run it under valgrind. Hello Vincent, To enable valgrind, I think you may need to add external/valgrind into your manifest, not only export B2G_VALGRIND=1 in your userconfig. You can refer nexus-4's manifest. Running valgrind on device may need 1GB+ device RAM, I think that's why we didn't add this into all manifests except nexus-4.
(In reply to Alan Huang [:ahuang] from comment #10) > (In reply to Vincent Liu[:vliu] from comment #8) > > (In reply to Andreas Gal :gal from comment #7) > Running valgrind on device may need 1GB+ device RAM, I think that's why we > didn't add this into all manifests except nexus-4. Oh yes, it is bug 909566
(In reply to Alan Huang [:ahuang] from comment #10) > (In reply to Vincent Liu[:vliu] from comment #8) > > (In reply to Andreas Gal :gal from comment #7) > > > Can you run this under valgrind? > > > > Yes, I will try to run it under valgrind. > > Hello Vincent, > To enable valgrind, I think you may need to add external/valgrind into your > manifest, not only export B2G_VALGRIND=1 in your userconfig. You can refer > nexus-4's manifest. Please add below in your base-jb.xml then "repo sync", or just copy those 2 folders :) <project path="external/valgrind" name="valgrind" remote="b2g" revision="fxos" /> <project path="external/VEX" name="vex" remote="b2g" revision="fxos" />
(In reply to Alan Huang [:ahuang] from comment #12) > (In reply to Alan Huang [:ahuang] from comment #10) > > (In reply to Vincent Liu[:vliu] from comment #8) > > > (In reply to Andreas Gal :gal from comment #7) > > > > Can you run this under valgrind? > > > > > > Yes, I will try to run it under valgrind. > > > > Hello Vincent, > > To enable valgrind, I think you may need to add external/valgrind into your > > manifest, not only export B2G_VALGRIND=1 in your userconfig. You can refer > > nexus-4's manifest. > > Please add below in your base-jb.xml then "repo sync", or just copy those 2 > folders :) > <project path="external/valgrind" name="valgrind" remote="b2g" > revision="fxos" /> > <project path="external/VEX" name="vex" remote="b2g" revision="fxos" /> Thanks, Alan. I think I missed catching external/VEX so I always got built fail.
(In reply to Benoit Girard (:BenWa) from comment #9) > Is it possible the JS scripts contain non ASCII characters? The garbage data is immediately after a timestamp, not a string, I don't think it is a part of a non-ascii string. You can find that at attached data.
Summary: [Flatfish][Profiling]: Fails to parse Gecko Profiling tool. → [Flatfish][Profiling]: Fails to analysis the parsed file for Gecko Profiling tool.
Keywords: perf
Whiteboard: [Flatfish only] → [c=profiling p= s= u=1.2] [Flatfish only]
Assignee: nobody → vliu
Priority: -- → P1
Hi Jerry/ BenWa, Please update what you found in today
Flags: needinfo?(hshih)
Flags: needinfo?(bgirard)
Hi Vincent, I looked into my flatfish build but find something weird. The PSS of /system/b2g/b2g is nearly 240MB after I start profiling. I met this three times. Even though flashfish has large RAM, this should still be a problem. I'm now building gecko with DMD to take a look.
(In reply to C.J. Ku[:CJKu] from comment #15) > Hi Jerry/ BenWa, > Please update what you found in today BenWa and I still work on this issue. When we disable some profiler core routine, the system is not crashed. We are finding which function causes this problem.
Flags: needinfo?(hshih)
Attached file smaps (deleted) —
Observed an unusually large memory block, too. 48500000-53500000 rwxp 00000000 00:00 0 Size: 180224 kB Rss: 176000 kB Pss: 176000 kB Shared_Clean: 0 kB Shared_Dirty: 0 kB Private_Clean: 0 kB Private_Dirty: 176000 kB Referenced: 176000 kB Anonymous: 176000 kB AnonHugePages: 0 kB Swap: 0 kB KernelPageSize: 4 kB MMUPageSize: 4 kB Locked: 0 kB
Attached patch mmap.diff (deleted) — Splinter Review
Since it is a single-large allocation, it's straightforward to break in mmap and exam the stacktrace: #0 0x4010ed80 in mmap (addr=0x0, size=134217728, prot=3, flags=34, fd=-1, offset=0) at bionic/libc/unistd/mmap.c:39 #1 0x400bc726 in pages_map (addr=0x0, size=134217728, pfd=<optimized out>) at ../../../gecko/memory/mozjemalloc/jemalloc.c:2407 #2 0x400bd6ca in chunk_alloc_mmap (size=134217728, pagefile=<optimized out>) at ../../../gecko/memory/mozjemalloc/jemalloc.c:2805 #3 chunk_alloc (size=<optimized out>, zero=<optimized out>, pagefile=<optimized out>) at ../../../gecko/memory/mozjemalloc/jemalloc.c:2914 #4 0x400be754 in huge_malloc (size=134217728, zero=<optimized out>) at ../../../gecko/memory/mozjemalloc/jemalloc.c:5020 #5 0x400bedf4 in huge_ralloc (oldsize=67108864, size=134217728, ptr=0x49a00000) at ../../../gecko/memory/mozjemalloc/jemalloc.c:5295 #6 iralloc (size=134217728, ptr=0x49a00000) at ../../../gecko/memory/mozjemalloc/jemalloc.c:4857 #7 realloc (ptr=0x49a00000, size=134217728) at ../../../gecko/memory/mozjemalloc/jemalloc.c:6501 #8 0x41c8c2be in js_realloc (bytes=134217728, p=0x49a00000) at ./../../dist/include/js/Utility.h:159 #9 realloc_ (p=0x49a00000, this=0xbe9c64e0, bytes=134217728, oldBytes=<optimized out>) at ../../../gecko/js/src/jsalloc.h:68 #10 growTo (newCap=134217728, v=...) at ./../../dist/include/mozilla/Vector.h:180 #11 mozilla::VectorBase<unsigned char, 0u, js::TempAllocPolicy, js::Vector<unsigned char, 0u, js::TempAllocPolicy> >::growStorageBy (this=0xbe9c64e0, incr=<optimized out>) at ./../../dist/include/mozilla/Vector.h:746 #12 0x41c8c400 in append<unsigned char> (u=<optimized out>, this=<optimized out>) at ./../../dist/include/mozilla/Vector.h:1002 #13 AllocSrcNote (cx=0x405d7b60, notes=...) at ../../../gecko/js/src/frontend/BytecodeEmitter.cpp:6137 #14 0x41c8c42c in js::frontend::NewSrcNote (cx=0x405d7b60, bce=<optimized out>, type=js::SRC_NEWLINE) at ../../../gecko/js/src/frontend/BytecodeEmitter.cpp:6150 #15 0x41c8e366 in UpdateLineNumberNotes (offset=<optimized out>, bce=0xbe9c6478, cx=0x405d7b60) at ../../../gecko/js/src/frontend/BytecodeEmitter.cpp:358 #16 js::frontend::EmitTree (cx=0x405d7b60, bce=0xbe9c6478, pn=0x44d71140) at ../../../gecko/js/src/frontend/BytecodeEmitter.cpp:5716 #17 0x41c90304 in js::frontend::EmitFunctionScript (cx=0x405d7b60, bce=0xbe9c6478, body=0x44d71140) at ../../../gecko/js/src/frontend/BytecodeEmitter.cpp:2634
Ting-Yuan, if I skip the ThreadProfile::addTag(ProfileEntry aTag) function, it seems no crash. But Benwa and I think it should not be buggy in this function.
Yes, I don't understand what's causing the problem. That code is very well tested.
Flags: needinfo?(bgirard)
376832 is 0x5c000. So we're clobbering 12 bytes at the start of a page: 0005bff0 3a 31 30 31 33 34 2e 30 39 34 38 33 37 39 39 39 |:10134.094837999| 0005c000 e0 bc 92 e9 ac a8 e1 98 bb e4 83 a7 2c 7b 22 6e |............,{"n| Those bytes are valid UTF-8, which seems like too much to ascribe to coincidence, but the ordinals are 0f12 9b28 163b 40e7 which I'm pretty sure isn't an actual piece of text we'd be handling (see http://www.unicode.org/charts/nameslist/mainList.html ). From attachment 812957 [details], profile_2093_b2g.txt: 00065ff0 76 65 6e 74 22 2c 22 6c 69 6e 65 22 3a 38 32 33 |vent","line":823| 00066000 e7 8c a5 ee bc ac e2 a9 97 e4 83 9f 6c 6f 63 61 |............loca| (Unicode: 7325 ef2c 2a57 40df) 000cd800 6d 65 73 22 3a 5b 7b 22 e7 b7 88 e7 96 ab e3 9b |mes":[{"........| 000cd810 ad e4 83 9f 74 69 6f 6e 22 3a 22 53 74 61 72 74 |....tion":"Start| (Unicode: 7dc8 75ab 36ed 40df; the correct text is probably “loca”.) profile_3349_b2g.txt: 000baff0 39 39 39 64 66 36 22 7d 5d 2c 22 74 69 6d 65 22 |999df6"}],"time"| 000bb000 e8 9a 90 ee 99 93 e8 8c a3 e4 83 9e 30 36 2e 32 |............06.2| 000bb010 35 35 33 34 31 7d 2c 7b 22 6e 61 6d 65 22 3a 22 |55341},{"name":"| (Unicode: 8690 e653 8323 40de) profile_4198_b2g.txt is truncated in the middle of a string, but it's not clobbered: 00046fb0 3a 22 57 65 62 61 70 70 73 41 70 70 6c 69 63 61 |:"WebappsApplica| 00046fc0 74 69 6f 6e 2e 70 72 6f 74 6f 74 79 70 65 2e 69 |tion.prototype.i| 00046fd0 6e 69 74 20 28 6a 61 72 3a 66 69 6c 65 3a 2f 2f |nit (jar:file://| 00046fe0 2f 73 79 73 74 65 6d 2f 62 32 67 2f 6f 6d 6e 69 |/system/b2g/omni| 00046ff0 2e 6a 61 21 2f 63 6f 6d 70 6f 6e 65 6e 74 73 2f |.ja!/components/| 00047000 profile_16155_b2g.txt: 00000ff0 74 5c 22 3a 31 31 31 32 38 38 33 32 30 30 2c 5c |t\":1112883200,\| 00001000 e0 bb a1 e2 a7 94 e2 84 b8 e4 83 98 5c 22 3a 31 |............\":1| 00001010 31 31 33 38 34 35 37 36 30 2c 5c 22 6f 66 66 73 |113845760,\"offs| (Unicode: 0ee1 29d4 2138 40d8; correct text is probably “"end”.) 00009730 69 6e 65 22 3a 33 38 38 36 7d 2c 7b 22 6c 6f 63 |ine":3886},{"loc| 00009740 61 74 69 6f 6e 22 3a 22 e2 85 8f ee a9 97 e2 99 |ation":"........| 00009750 96 e4 83 98 63 61 34 61 66 66 22 7d 2c 7b 22 6c |....ca4aff"},{"l| (Unicode: 214f ea57 2656 40d8; correct text is probably 0x and two hexdigits.) 0000a800 6a 61 72 3a 66 69 6c 65 3a 2f 2f 2f 73 79 73 74 |jar:file:///syst| 0000a810 ef bf bd d3 b6 e1 ba af e4 83 98 32 67 2f 6f 6d |...........2g/om| 0000a820 6e 69 2e 6a 61 21 2f 63 6f 6d 70 6f 6e 65 6e 74 |ni.ja!/component| (Unicode: fffd 04f6 1eaf 40d8 -- here it's only 11 bytes, but one of the characters is a 2-byte UTF-8 sequence. Note that that's a real FFFD, not a decoding error on my end; maybe the original halfword was in the surrogate pair space?) So we're overwriting 4 characters in a UTF-16 representation of the output file? At addresses that are at least 2k-aligned when first encountered, and the offset in the file increases by 8 (the difference in number of bytes between correct text and incorrect) from that alignment each time — which strongly suggests they're page-aligned in the original UTF-16. (Assuming that "the original UTF-16" exists; I'm not very familiar with this part of the profiler.) The last halfword is interesting, because it looks like it the last two halfwords are an address — if the attached smaps file is representative (there's some randomization of mappings, but not much), it'd probably be in libxul's text section. Which would explain why they're mostly odd, meaning Thumb addresses. (Not entirely so, but there are some ARM-mode asm routines that are hot for some workloads, I think? pixman, maybe?) It'd be interesting to see what's at that address (e.g., if it's after a call site, or if it's the start of a function that's called by pointer somewhere).
Status: NEW → ASSIGNED
Updating Target Milestone for FxOS Perf koi+'s.
Target Milestone: --- → 1.2 C4(Nov8)
Whiteboard: [c=profiling p= s= u=1.2] [Flatfish only] → [c=profiling p= s= u=1.2] [Flatfish only][developer+]
as we confirmed that Flatfish will use v1.3 and we need to respect current testing cycle, set target milestone to 12/6 since 1.3FC tag is not created yet.
blocking-b2g: koi+ → 1.3+
Target Milestone: 1.2 C4(Nov8) → 1.3 Sprint 6 - 12/6
Whiteboard: [c=profiling p= s= u=1.2] [Flatfish only][developer+] → [c=profiling p= s= u=1.3] [Flatfish only][developer+]
hi Thinker, may i know if there's any update for this issue? as you know, we will release tablet for developer, this tool is a key item to help them.
Flags: needinfo?(tlee)
Francis, since Ting-Yuan and other people are busy on other tasks. I would seek another engineer to take this bug in couple days.
Flags: needinfo?(tlee)
Moving from 1.3+ to 1.4+ since Flatfish will use 1.4.
blocking-b2g: 1.3+ → 1.4+
Summary: [Flatfish][Profiling]: Fails to analysis the parsed file for Gecko Profiling tool. → [Flatfish][Profiling]: Fails to analyze the parsed file for Gecko Profiling tool.
Whiteboard: [c=profiling p= s= u=1.3] [Flatfish only][developer+] → [c=profiling p= s= u=1.4] [Flatfish only][developer+]
Target Milestone: 1.3 Sprint 6 - 12/6 → ---
Whiteboard: [c=profiling p= s= u=1.4] [Flatfish only][developer+] → [c=profiling p= s= u=flatfish] [Flatfish only][developer+]
This doesn't fall under the QC blocking feature list & DSDS feature list. Renoming.
blocking-b2g: 1.4+ → 1.4?
blocking-b2g: 1.4? → backlog
Setting as P4. See FxOS Perf Priority Levels here: https://wiki.mozilla.org/FirefoxOS/Performance/Triage
Component: General → Performance
Priority: P1 → P4
I tried |./profile.sh capture| 10 times, but couldn't reproduce the bug with my local build today. Vincent, can you still reproduce it? Or maybe I missed anything? Version: gecko: 028ca01a20110e3d1bbb775bdc8929c69fe7b987 gaia: 7f7dda12b10ae62b674fa2ebdaeb53dea6f2976c Build instruction: GAIA_DISTRIBUTION_DIR=distribution_tablet B2G_SYSTEM_APPS=1 B2G_UPDATER=1 ./build.sh
Flags: needinfo?(vliu)
I noticed JS_Stringify() is not used anymore in SaveProfileTask.cpp after bug 867728 landed. I guess that's the reason why I couldn't repro. Running with profiler enabled can still see strange javascript errors randomly (probably corrupted) from my latest build, such as: JavaScript error: chrome://b2g/content/settings.js, line 466: missing catch or finally after try JavaScript error: chrome://b2g/content/shell.js, line 7: Cu is not defined System JS : ERROR jar:file:///system/b2g/omni.ja!/components/nsPrompter.js:553 - SyntaxError: illegal character
We've changed how the profiler are saved to not depend on the javascript engine so perhaps it fixes this bug as well. Last time I looked the flatflish profiling would hang the device so thats why this bug hasn't gotten much attention.
BenWa, thanks for the information. I'll take a look at bug 926734 then.
It seems BenWa had verified it. Thanks. I also reassign owner because I am not working on this now.
Assignee: vliu → nobody
Flags: needinfo?(vliu)
Whiteboard: [c=profiling p= s= u=flatfish] [Flatfish only][developer+] → [c=profiling p= s= u=flatfish] [flatfish][TCP]
Whiteboard: [c=profiling p= s= u=flatfish] [flatfish][TCP] → [c=profiling p= s= u=flatfish] [flatfish][TCP=breakage]
Blocks: flatfish
blocking-b2g: backlog → ---
Firefox OS is not being worked on
Status: ASSIGNED → RESOLVED
Closed: 7 years ago
Resolution: --- → WONTFIX
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: