Closed
Bug 1206634
Opened 9 years ago
Closed 9 years ago
[MTBF] After 20 hours executing, test cases failed consequently
Categories
(Firefox OS Graveyard :: Stability, defect, P2)
Tracking
(blocking-b2g:2.5+)
RESOLVED
FIXED
blocking-b2g | 2.5+ |
People
(Reporter: pyang, Assigned: ting)
References
Details
(Whiteboard: [MemShrink:P1])
Attachments
(2 files, 1 obsolete file)
Traceback (most recent call last):
File "/var/jenkins/workspace/flamekk.vmaster.moztwlab01.512/label/moztwlab-01/mtbf-env/local/lib/python2.7/site-packages/marionette_client-0.17-py2.7.egg/marionette/runner/mixins/reporting.py", line 256, in gather_debug
debug['source'] = self.marionette.page_source
File "/var/jenkins/workspace/flamekk.vmaster.moztwlab01.512/label/moztwlab-01/mtbf-env/local/lib/python2.7/site-packages/marionette_driver-0.13-py2.7.egg/marionette_driver/marionette.py", line 1164, in page_source
return self._send_message("getPageSource", key="value")
File "/var/jenkins/workspace/flamekk.vmaster.moztwlab01.512/label/moztwlab-01/mtbf-env/local/lib/python2.7/site-packages/marionette_driver-0.13-py2.7.egg/marionette_driver/decorators.py", line 36, in _
return func(*args, **kwargs)
File "/var/jenkins/workspace/flamekk.vmaster.moztwlab01.512/label/moztwlab-01/mtbf-env/local/lib/python2.7/site-packages/marionette_driver-0.13-py2.7.egg/marionette_driver/marionette.py", line 711, in _send_message
self._handle_error(resp)
File "/var/jenkins/workspace/flamekk.vmaster.moztwlab01.512/label/moztwlab-01/mtbf-env/local/lib/python2.7/site-packages/marionette_driver-0.13-py2.7.egg/marionette_driver/marionette.py", line 752, in _handle_error
raise errors.lookup(error)(message, stacktrace=stacktrace)
MarionetteException: MarionetteException: curContainer.frame is null
STR: run mtbf over 20 hours
EXPECT: over 100 hours
Updated•9 years ago
|
Flags: needinfo?(bchien)
Updated•9 years ago
|
Reporter | ||
Updated•9 years ago
|
Blocks: mtbf-2015q3
Assignee | ||
Comment 2•9 years ago
|
||
We can start from adding some logs to listener.js to see where set curContainer.frame null.
Comment 3•9 years ago
|
||
ni David for comment on Marrionette Exception. This exception was raised from MTBF testing. please help to give us some feedback.
Assignee: nobody → janus926
Flags: needinfo?(dburns)
Comment 4•9 years ago
|
||
Can this be easily created? That error is suggesting that there is no document to be able to get the source.
Flags: needinfo?(dburns)
Reporter | ||
Comment 5•9 years ago
|
||
Add memory report
Assignee | ||
Comment 6•9 years ago
|
||
From the device I reproduced today, I saw frequent memory pressure, USS of B2G process takes >130MB.
Assignee | ||
Comment 7•9 years ago
|
||
(In reply to Paul Yang [:pyang] from comment #5)
> Created attachment 8668852 [details]
There're two big chunks, large memory-file-data and heap unclassified:
181.86 MB (100.0%) -- explicit
├───78.08 MB (42.94%) -- dom
│ ├──78.07 MB (42.93%) -- memory-file-data
│ │ ├──76.35 MB (41.98%) -- large
│ │ │ ├──33.25 MB (18.28%) ── file(length=201666, sha1=189b37fbf1fde69adc714e419c1ce4324a054d93) [152]
│ │ │ ├──12.25 MB (06.73%) ++ (65 tiny)
│ │ │ ├───8.50 MB (04.67%) ── file(length=257119, sha1=c522aa26de553674e51c89e2b397d16a4ce73d51) [34]
│ │ │ ├───5.39 MB (02.96%) ── file(length=73595, sha1=4428ff94c438931de93776bd69ab1eb3f3c98416) [69]
│ │ │ ├───5.13 MB (02.82%) ── file(length=125524, sha1=30aa9e41b906f96dfd7697dc660a245a9d4b339c) [41]
│ │ │ ├───4.63 MB (02.54%) ── file(length=130677, sha1=3a02c9cb99e691c546170ab91ceaf92fd10dc606) [37]
│ │ │ ├───2.77 MB (01.53%) ── file(length=36100, sha1=079300a794680db80b8648ba2284344f6d142990) [71]
│ │ │ ├───2.41 MB (01.32%) ── file(length=55507, sha1=9513ea348f7a6874de089a73e742c631b085a7c7) [44]
│ │ │ └───2.03 MB (01.12%) ── file(length=154687, sha1=df8f2b598e528ddeb88f7968863f26cf796bcde9) [13]
├───38.52 MB (21.18%) ── heap-unclassified
Updated•9 years ago
|
Blocks: TV_Gecko_P2
Assignee | ||
Comment 8•9 years ago
|
||
The device reset while pulling gc/cc logs, but still got the memory report and DMD.
347.16 MB (100.0%) -- explicit
├──188.96 MB (54.43%) -- dom
│ ├──188.95 MB (54.43%) -- memory-file-data
│ │ ├──184.15 MB (53.05%) -- large
│ │ │ ├───85.31 MB (24.57%) ── file(length=188369, name=, path=, type=image/png, sha1=766755f3ce99c5b1d3bbc325257e2b68eca2839c) [455]
│ │ │ ├───14.88 MB (04.29%) ++ (189 tiny)
│ │ │ ├───14.58 MB (04.20%) ── file(length=43637, name=, path=, type=image/jpeg, sha1=61efad1b973d9550a950a7d50dab1b572d74ab96) [311]
│ │ │ ├───14.38 MB (04.14%) ── file(length=156812, name=, path=, type=image/jpeg, sha1=961e0d10b5c7691ec925d503aa59073dbbbae250) [92]
│ │ │ ├────9.95 MB (02.87%) ── file(length=110450, name=, path=, type=image/png, sha1=a2daae3ebcc80df9a215f88c0abc1aa95acff7f1) [91]
│ │ │ ├────9.52 MB (02.74%) ── file(length=110175, name=, path=, type=image/png, sha1=e7de7c9506b8d1508b2de9bdba92b07dcc062340) [87]
│ │ │ ├────7.97 MB (02.30%) ── file(length=146587, name=, path=, type=image/jpeg, sha1=3bbef36280af9025b96372f0049823df8822702f) [51]
│ │ │ ├────7.34 MB (02.12%) ── file(length=36100, name=, path=, type=image/jpeg, sha1=079300a794680db80b8648ba2284344f6d142990) [188]
│ │ │ ├────6.34 MB (01.83%) ── file(length=206292, name=, path=, type=image/png, sha1=533a1f920b3159a60360bd2f24e7fbc1dc99e128) [29]
│ │ │ ├────6.19 MB (01.78%) ── file(length=93846, name=, path=, type=image/jpeg, sha1=40f59205a87e0d561f0662891eed6dc48405165b) [66]
│ │ │ ├────4.13 MB (01.19%) ── file(length=94304, name=, path=, type=image/jpeg, sha1=0cfb952df1a1293e9c5ac6c2313317c94e5a162e) [44]
│ │ │ └────3.56 MB (01.03%) ── file(length=188901, name=, path=, type=image/png, sha1=52748a1aa1f3bd3249ad79838a8b83acd0f7372c) [19]
│ │ └────4.80 MB (01.38%) ── small
├───50.56 MB (14.56%) ── heap-unclassified
Assignee | ||
Comment 9•9 years ago
|
||
Kyle, I couldn't find BlobImplFile or similar Blob corresponding to the large memory-file-data from cc/gc edges in attachment 8668852 [details]. There're some from GridView, but I don't know are they related.
Would you shed me some light on how to debug this further? Thanks.
Flags: needinfo?(khuey)
At least some of the blob JS objects show us as [garbage] in the CC log, meaning they were collected when the memory report was taken. Did you use --minimize to GC first? That might help.
Flags: needinfo?(khuey)
Assignee | ||
Comment 11•9 years ago
|
||
I've confirmed with Paul that attachment 8668852 [details] was collected with --minimize. Grabbed a new one on the device reproduced today.
Attachment #8674646 -
Attachment is obsolete: true
Assignee | ||
Comment 12•9 years ago
|
||
From the attachment 8676014 [details], I can still see [garbage] blob JS objects. But I also noticed this:
0xb04b0bf0 [rc=4] FragmentOrElement (xhtml) iframe id='browser163' class='browser' (orphan) app://system.gaiamobile.org/index.html
which is also a [garbage].
Comment 13•9 years ago
|
||
any progress?
Assignee | ||
Comment 14•9 years ago
|
||
Am still working on it, not much progress though...
Assignee | ||
Comment 15•9 years ago
|
||
I'm still chasing down this one from cc log of attachment 8676014 [details]:
0xa9c855f0 [rc=13] FragmentOrElement (xhtml) div id='AppWindow_2' class='appWindow render collapsible light browser scrollable inactive' (orphan) app://system.gaiamobile.org/index.html
Assignee | ||
Comment 16•9 years ago
|
||
Bug 1217289 is filed for the leak of AppWindow_2, not sure is it all the leaks...
Assignee | ||
Comment 17•9 years ago
|
||
I left 3 devices with the patch of bug 1217289 on Friday, and they're still running this morning.
Assignee | ||
Comment 18•9 years ago
|
||
(In reply to Ting-Yu Chou [:ting] from comment #17)
> I left 3 devices with the patch of bug 1217289 on Friday, and they're still
> running this morning.
Unfortunately, there still seems are memory leaks and all the processes get killed when I |./tools/get_about_memory.py -m|.
Assignee | ||
Comment 19•9 years ago
|
||
In DMD, there's a significant portion with this stack, I'll take a look:
#01: replace_malloc /home/ting/w/fx/os/flame/gecko/memory/replace/dmd/DMD.cpp:1254 (libdmd.so+0x3900)
#02: nsSegmentedBuffer::AppendNewSegment() /home/ting/w/fx/os/flame/gecko/xpcom/io/nsSegmentedBuffer.cpp:68 (libxul.so+0x3fb4c8)
#03: nsPipe::GetWriteSegment(char*&, unsigned int&) /home/ting/w/fx/os/flame/gecko/xpcom/io/nsPipe3.cpp:820 (libxul.so+0x3fb640)
#04: nsPipeOutputStream::WriteSegments(nsresult (*)(nsIOutputStream*, void*, char*, unsigned int, unsigned int, unsigned int*), void*, unsigned int, unsigned int*) /home/ting/w/fx/os/flame/gecko/xpcom/io/nsPipe3.cpp:1628 (libxul.so+0x3fd3f6)
Comment 20•9 years ago
|
||
Can this explain the behavior I have spotted in bug 1215452 ? If so, we should mark bug 1215452 as being dependant on that one.
Flags: needinfo?(janus926)
Assignee | ||
Comment 21•9 years ago
|
||
(In reply to Alexandre LISSY :gerard-majax from comment #20)
> Can this explain the behavior I have spotted in bug 1215452 ? If so, we
> should mark bug 1215452 as being dependant on that one.
Do you mean comment 18?
Flags: needinfo?(janus926)
Comment 22•9 years ago
|
||
(In reply to Ting-Yu Chou [:ting] from comment #21)
> (In reply to Alexandre LISSY :gerard-majax from comment #20)
> > Can this explain the behavior I have spotted in bug 1215452 ? If so, we
> > should mark bug 1215452 as being dependant on that one.
>
> Do you mean comment 18?
I mean the whole leaking, can this explain why taking an about:memory on device after 7 days takes 5 minutes instead of a few seconds ?
Flags: needinfo?(janus926)
Updated•9 years ago
|
Whiteboard: [MemShrink]
Assignee | ||
Comment 23•9 years ago
|
||
Maybe, I'm not sure. There're other things to consider, for example, 1) did you use "--minimize" to trigger GC to get rid of dead objects, 2) the heap may be very fragamented to walk through.
Flags: needinfo?(janus926)
Assignee | ||
Comment 25•9 years ago
|
||
The nsPipe of the input/output stream from nsServerSocket's client socket are leaked, trying to sort it out how.
Assignee | ||
Comment 26•9 years ago
|
||
The pipe has refcnt 1 from mOriginalInput, which has refcnt 3. Probably from marionette server.
Updated•9 years ago
|
Whiteboard: [MemShrink] → [MemShrink:P1]
Assignee | ||
Comment 27•9 years ago
|
||
I learned from QA that MTBF now reaches 48 hours with the patch of bug 1217289. Hopefully we can have bug 1220012 reviewed soon to push it further.
Comment 28•9 years ago
|
||
Over 40 hours last week. Raptor report [1][2].
[1] Flame: http://mzl.la/1Q520Ti
[2] Aries: http://mzl.la/1Q51Xqw
Updated•9 years ago
|
Flags: needinfo?(pyang)
Comment 30•9 years ago
|
||
Close per all depends on closed.
Status: NEW → RESOLVED
Closed: 9 years ago
Flags: needinfo?(pyang)
Flags: needinfo?(bchien)
Resolution: --- → FIXED
You need to log in
before you can comment on or make changes to this bug.
Description
•