Closed
Bug 998504
Opened 11 years ago
Closed 10 years ago
[META] window.open()/.close() memory leak
Categories
(Firefox OS Graveyard :: Performance, defect, P1)
Tracking
(blocking-b2g:1.4+, b2g-v1.4 fixed, b2g-v2.0 fixed)
People
(Reporter: m1, Unassigned)
References
Details
(5 keywords, Whiteboard: [c=memory p= s= u=1.4] [cr 651835][MemShrink:P1][ETA 5/2])
Attachments
(2 files)
A memory leak is observed on v1.4 in both the main process and content process of an app that repeatedly invokes window.open()/close().
USS/RSS are fine, but VSS blows up. It seems to be a /dev/ashmem-related fd leak, as |watch ls -l /proc/<b2g_pid>/fd/| shows thousands of open file descriptors right before memory is exhausted and the content process is killed.
STR:
* Run the test app at bug 964386 attachment 8366455 [details] on a QRD device.
This leak does not reproduce with a v1.3 Gecko/Gaia on the same device/gonk/testapp
Updated•11 years ago
|
Whiteboard: [cr 651835] → [cr 651835][MemShrink]
Comment 1•11 years ago
|
||
Can someone on the QC side grab an about:memory report when this bug reproduces?
Reporter | ||
Comment 2•11 years ago
|
||
An about:memory report is not helpful in this case, as the leak isn't in normal heap this time. Looks like it's a file descriptor leak.
Comment 3•11 years ago
|
||
May I know that are we confirmed this is a CS blocker for sure? Is this a hard 1.4 blocker?
Thanks.
Flags: needinfo?(praghunath)
Flags: needinfo?(mvines)
Reporter | ||
Comment 4•11 years ago
|
||
Absolutely. Without this bug fixed we'll never get close to the stability goals.
Flags: needinfo?(mvines)
Comment 5•11 years ago
|
||
(In reply to Kevin Hu [:khu] from comment #3)
> May I know that are we confirmed this is a CS blocker for sure? Is this a
> hard 1.4 blocker?
> Thanks.
Yes Kevin confirmed blocker.
Flags: needinfo?(praghunath)
Comment 7•11 years ago
|
||
(In reply to Michael Vines [:m1] [:evilmachines] from comment #0)
> USS/RSS are fine, but VSS blows up. It seems to be a /dev/ashmem-related fd
> leak, as |watch ls -l /proc/<b2g_pid>/fd/| shows thousands of open file
> descriptors right before memory is exhausted and the content process is
> killed.
Can we get an about:memory dump at several points during the test app's execution?
My guess is fallout from bug 748958 + bug 962670, since AFAIK the last one is the only thing making real use of /dev/ashmem.
Reporter | ||
Comment 8•11 years ago
|
||
(In reply to Andrew Overholt [:overholt] from comment #6)
> Is it possible to get a regression range here?
Does not reproduce on v1.3. Please assign a Mozilla engineer to debug further and fix, thanks!
Flags: needinfo?(mvines)
Comment 9•11 years ago
|
||
FWIW I was hoping for something more recent than when 1.4 branched to Aurora since it not reproducing on 1.3 gives us at least that range.
Comment 10•11 years ago
|
||
I'll see if I can reproduce and start investigating on my nexus4 tomorrow.
Assignee: nobody → bkelly
Status: NEW → ASSIGNED
Reporter | ||
Comment 11•11 years ago
|
||
Thanks, Ben. Just watch the Vss from |adb shell b2g-procrank| grow for the b2g and corresponding content process. Also notice that /proc/<pid>/fd/ grows into the thousands for both.
Looking back at some automation logs it looks like the rough regression range is:
* ftp://ftp.mozilla.org/pub/mozilla.org/b2g/manifests/nightly/1.4.0/2014/03/2014-03-13-16
* ftp://ftp.mozilla.org/pub/mozilla.org/b2g/manifests/nightly/1.4.0/2014/03/2014-03-09-16
This doesn't reproduce on a Buri, so please try with one of the newer 8x10-based devices.
Comment 12•11 years ago
|
||
Reproduced on my open-c.
Comment 13•11 years ago
|
||
Comment 14•11 years ago
|
||
Comment 15•11 years ago
|
||
I don't see any obvious leaks diff'ing those two memory reports. Some slight increase in strong observers, but not enough for 40 cycles.
The fact that this happens on newer devices, but not buri suggests gfx fence fd issues.
Comment 16•11 years ago
|
||
I don't have convincing evidence yet that this is related to gfx fences, but adding Sotaro in case he has any suggestions on how to rule them out.
Also, the fact that we don't like during normal painting and app usage suggestions that might not be it.
Comment 17•11 years ago
|
||
The fd's are leaking in the parent process as well. I think that's consistent with gfx buffers, but could also indicate IPC resources or something.
Updated•11 years ago
|
Comment 18•11 years ago
|
||
Some more observations:
1) The leak still occurs if we remove 'attention' from window.open().
2) If the screen is off, the window.open() continues to fire from js, but we do not leak.
3) If the screen is on, but locked, then we do leak.
Also, if I leave the screen off for a while and then activate it, I get this for each window activation while asleep:
E/GeckoConsole( 3759): [JavaScript Error: "TypeError: Argument 1 of Node.removeChild is not an objec
t." {file: "app://system.gaiamobile.org/js/popup_manager.js" line: 92}]
Comment 19•11 years ago
|
||
Would connecting ashmem to DMD, assuming that makes any sense, help get data about this?
Comment 20•11 years ago
|
||
(In reply to Jed Davis [:jld] from comment #19)
> Would connecting ashmem to DMD, assuming that makes any sense, help get data
> about this?
Unfortunately DMD crashed when I tried enabling it on this device.
Comment 21•11 years ago
|
||
Looking at the long listing of the process fd directory, this certainly appears to be a fence issue:
lrwx------ root root 2014-04-22 15:45 626 -> anon_inode:dmabuf
lrwx------ root root 2014-04-22 15:45 627 -> anon_inode:dmabuf
lr-x------ root root 2014-04-22 15:45 628 -> anon_inode:sync_fence
lr-x------ root root 2014-04-22 15:45 629 -> anon_inode:sync_fence
bkelly@lenir:/srv/gaia-master/apps/system/js$ adb shell ls -l /proc/9779/fd | grep fence | wc -l
247
bkelly@lenir:/srv/gaia-master/apps/system/js$ adb shell ls -l /proc/9779/fd | grep dmabuf | wc -l
508
Comment 22•11 years ago
|
||
Sotaro, Peter, Sushil,
Do any of you have suggestions on how to track down or rule out a graphics related fence leak?
We don't leak under normal painting, just when opening/closing iframes within an existing child process. Does that hit a different code path for our locking?
Thanks!
Flags: needinfo?(sushilchauhan)
Flags: needinfo?(sotaro.ikeda.g)
Flags: needinfo?(pchang)
Comment 23•11 years ago
|
||
Here's a profile I've been using to keep my bearings on the overall process.
http://people.mozilla.org/~bgirard/cleopatra/#report=d1c7dc5412800059414908d6ef9d37c13aa3acc5
Comment 24•11 years ago
|
||
It seems that for each window open/close cycle we are constructing 8 GrallocTextureClientOGL objects, but only destructing 3 of them.
Comment 25•11 years ago
|
||
Correction, only two are being destructed. Some instrumentation to show what I am seeing:
I/Gecko (16445): ### ### TabChild::BrowserFrameProvideWindow() start, fds:574
I/Gecko (16445): ### ### TabChild::BrowserFrameProvideWindow() end, fds:574
E/GeckoConsole(16445): Content JS LOG at app://windowtest.gaiamobile.org/js/window_test.js:19 in am_
set: In set
E/GeckoConsole(16445): [JavaScript Warning: "No meta-viewport tag found. Please explicitly specify o
ne to prevent unexpected behavioural changes in future versions. For more help https://developer.moz
illa.org/en/docs/Mozilla/Mobile/Viewport_meta_tag" {file: "app://windowtest.gaiamobile.org/helloworl
d.html" line: 0}]
I/Gecko (16445): ### ### GrallocTextureClientOGL() start, count:171 fds:574
I/Gecko (16445): ### ### GrallocTextureClientOGL() end, count:171 fds:574
I/Gecko (16445): ### ### GrallocTextureClientOGL() start, count:172 fds:576
I/Gecko (16445): ### ### GrallocTextureClientOGL() end, count:172 fds:576
E/GeckoConsole(16445): Content JS LOG at app://windowtest.gaiamobile.org/js/helloworld.js:5 in rv_in
it: In helloworld page
I/Gecko (16445): ### ### GrallocTextureClientOGL() start, count:173 fds:578
I/Gecko (16445): ### ### GrallocTextureClientOGL() end, count:173 fds:578
I/Gecko (16445): ### ### GrallocTextureClientOGL() start, count:174 fds:581
I/Gecko (16445): ### ### GrallocTextureClientOGL() end, count:174 fds:581
I/Gecko (16445): ### ### GrallocTextureClientOGL() start, count:175 fds:583
I/Gecko (16445): ### ### GrallocTextureClientOGL() end, count:175 fds:583
I/Gecko (16445): ### ### GrallocTextureClientOGL() start, count:176 fds:585
I/Gecko (16445): ### ### GrallocTextureClientOGL() end, count:176 fds:585
I/Gecko (16445): ### ### GrallocTextureClientOGL() start, count:177 fds:587
I/Gecko (16445): ### ### GrallocTextureClientOGL() end, count:177 fds:587
I/Gecko (16445): ### ### GrallocTextureClientOGL() start, count:178 fds:589
I/Gecko (16445): ### ### GrallocTextureClientOGL() end, count:178 fds:589
I/Gecko (16445): ### ### ~GrallocTextureClientOGL() start, count:178 fds:591
I/Gecko (16445): ### ### ~GrallocTextureClientOGL() should not deallocate
I/Gecko (16445): ### ### ~GrallocTextureClientOGL() end, count:178 fds:591
I/Gecko (16445): ### ### ~GrallocTextureClientOGL() start, count:177 fds:591
I/Gecko (16445): ### ### ~GrallocTextureClientOGL() should not deallocate
I/Gecko (16445): ### ### ~GrallocTextureClientOGL() end, count:177 fds:591
I/Gecko (16445): ### ### TabChild::DestroyWindow() start, fds:587
I/Gecko (16445): ### ### TabChild::DestroyWindow() destroy base window, fds:587
I/Gecko (16445): ### ### TabChild::DestroyWindow() destroy widget, fds:587
I/Gecko (16445): ### ### TabChild::DestroyWindow() destroy remote frame, fds:587
I/Gecko (16445): ### ### TabChild::DestroyWindow() end, fds:590
Comment 26•11 years ago
|
||
(In reply to Ben Kelly [:bkelly] from comment #22)
> Sotaro, Peter, Sushil,
>
> Do any of you have suggestions on how to track down or rule out a graphics
> related fence leak?
>
Hi Ben,
If you have the set-up ready, can you quickly check by reverting these 2 patches:
1. https://bugzilla.mozilla.org/show_bug.cgi?id=986253
2. https://bugzilla.mozilla.org/show_bug.cgi?id=974152
Try reverting this too, only if above 2 do not help:
https://bugzilla.mozilla.org/show_bug.cgi?id=977880
Flags: needinfo?(sushilchauhan)
Comment 27•11 years ago
|
||
Sotaro,
I checked with | adb shell lsof | grep "sync" | command. There is huge increase in sync_fence fd counts when Settings App is launched and this count does not decrease even when user returns to Home Screen.
Reverting your patch: https://bugzilla.mozilla.org/show_bug.cgi?id=977880 is fixing it.
Flags: needinfo?(sotaro.ikeda.g)
Updated•11 years ago
|
Whiteboard: [cr 651835][MemShrink] → [cr 651835][MemShrink:P1]
Comment 28•11 years ago
|
||
Reverting the patch mentioned in Comment 27 hugely reduce the sync_fence fd count. Then, reverting below 2 patches further reduce sync_fence fd count. Use cases: Launch and Exit the Settings or Video App:
1. https://bugzilla.mozilla.org/show_bug.cgi?id=986253
2. https://bugzilla.mozilla.org/show_bug.cgi?id=974152
Updated•11 years ago
|
Flags: needinfo?(sotaro.ikeda.g)
Comment 30•11 years ago
|
||
Thanks Sotaro! Mind if I pass the assignment to you for now? Feel free to pass back if it ends up not being in your court.
Assignee: bkelly → sotaro.ikeda.g
Updated•11 years ago
|
Component: Performance → Graphics: Layers
Product: Firefox OS → Core
Version: unspecified → 30 Branch
Comment 31•11 years ago
|
||
A lifetime of Fence is tied to TextureHost/TextureClient's lifetime. My current assumption is TextureHost/TextureClient leak also causes Fence leak.
Updated•11 years ago
|
Whiteboard: [cr 651835][MemShrink:P1] → [cr 651835][MemShrink:P1][c= p= s= u=]
Updated•11 years ago
|
Whiteboard: [cr 651835][MemShrink:P1][c= p= s= u=] → [cr 651835][MemShrink:P1][c=memory p= s= u=]
Comment 32•11 years ago
|
||
I confirmed that the increase of GrallocTextureClientOGL on v1.4 nexus-4. But it does not happen on master nexus-4.
Flags: needinfo?(sotaro.ikeda.g)
Comment 33•11 years ago
|
||
When tiling is disabled, the increase of GrallocTextureClientOGL did not happen.
Comment 34•11 years ago
|
||
It becomes clear that the GrallocTextureClientOGL leak happens by Bug 982339.
Comment 35•11 years ago
|
||
(In reply to Sotaro Ikeda [:sotaro] from comment #34)
> It becomes clear that the GrallocTextureClientOGL leak happens by Bug 982339.
To apply Bug 982339 on b2g v1.4, Bug 985302 is also necessary.
Comment 36•11 years ago
|
||
After applying Bug 982339 and Bug 985302, I did not see the increase of file descriptors. But during running the test, the app stop to work because of pipe error on IPC.
Comment 37•11 years ago
|
||
(In reply to Sotaro Ikeda [:sotaro] from comment #36)
> After applying Bug 982339 and Bug 985302, I did not see the increase of file
> descriptors. But during running the test, the app stop to work because of
> pipe error on IPC.
It is same to master b2g.
Comment 38•11 years ago
|
||
(In reply to Sotaro Ikeda [:sotaro] from comment #37)
> (In reply to Sotaro Ikeda [:sotaro] from comment #36)
> > After applying Bug 982339 and Bug 985302, I did not see the increase of file
> > descriptors. But during running the test, the app stop to work because of
> > pipe error on IPC.
>
> It is same to master b2g.
Created Bug 1000525 to handle the problem in Comment 36.
Updated•11 years ago
|
Flags: needinfo?(pchang)
Updated•11 years ago
|
Whiteboard: [cr 651835][MemShrink:P1][c=memory p= s= u=] → [c=memory p= s= u=1.4] [cr 651835][MemShrink:P1]
Comment 39•11 years ago
|
||
Sotaro,
Can you please confirm that bug 1000525 fixes the issue here?
What would be pending once bug 1000525 lands?
Flags: needinfo?(sotaro.ikeda.g)
Comment 40•11 years ago
|
||
We also need bug 1004191.
Flags: needinfo?(sotaro.ikeda.g)
Whiteboard: [c=memory p= s= u=1.4] [cr 651835][MemShrink:P1] → [c=memory p= s= u=1.4] [cr 651835][MemShrink:P1][ETA 5/2]
Comment 41•11 years ago
|
||
By Bug 982339 and Bug 1004191, the memory leak around gfx layers seems to be fixed. But there are still other area's leak. During running the test, I still saw WindowTest app's VSS and RSS increase. One thing I recognized is that TabChild is leaking during test. TabChild's destructor was not called.
Comment 42•11 years ago
|
||
(In reply to Sotaro Ikeda [:sotaro] from comment #41)
> One thing I recognized is
> that TabChild is leaking during test. TabChild's destructor was not called.
I confirmed that TabChild::RecvDestroy() is called.
Comment 43•11 years ago
|
||
(In reply to Sotaro Ikeda [:sotaro] from comment #41)
> By Bug 982339 and Bug 1004191, the memory leak around gfx layers seems to be
> fixed. But there are still other area's leak. During running the test, I
> still saw WindowTest app's VSS and RSS increase. One thing I recognized is
> that TabChild is leaking during test. TabChild's destructor was not called.
Bug 1004630 is created for TabChild's leak.
Updated•11 years ago
|
Component: Graphics: Layers → Performance
Product: Core → Firefox OS
Version: 30 Branch → unspecified
Comment 44•11 years ago
|
||
For the gfx's leaks, each bug is created. And all of them are near to fix. Return this bug's component back to Firefox OS Performance.
Updated•11 years ago
|
Assignee: sotaro.ikeda.g → nobody
Comment 45•11 years ago
|
||
I do not saw the leak around gfx layers after locally applying the ongoing fixes.
Comment 46•11 years ago
|
||
(In reply to Sotaro Ikeda [:sotaro] from comment #45)
> I do not saw the leak around gfx layers after locally applying the ongoing
> fixes.
Hi Sotaro,
Which local ongoing fixes are we using?
Please add bug numbers. Can this blocker be closed now?
Flags: needinfo?(milan)
Comment 47•10 years ago
|
||
This can't be closed until the blocking bugs are closed. One remaining graphics bugs, close to landing is bug 1004191, but there is also dom bug 1004630 that's needed.
Flags: needinfo?(milan)
Updated•10 years ago
|
Comment 48•10 years ago
|
||
changing title to include META so that we are clear about why it isn't assigned to anybody specific.
Keywords: meta
Summary: window.open()/.close() memory leak → [META] window.open()/.close() memory leak
Comment 49•10 years ago
|
||
(In reply to Milan Sreckovic [:milan] from comment #40)
> We also need bug 1004191.
Not anymore.
No longer depends on: 1004191
Comment 50•10 years ago
|
||
Let's see if we can close this. All the dependent bugs have landed on all relevant branches (note that bug 1004630 didn't need an uplift to b2g30, as it was introduced in 31). So, this should now be fixed in 1.4 (30), and 2.0 (32) (also at 31, but there is no b2g version for that.)
With that in mind, if we can repeat the original test, it would help us confirm the fix, or see if there is something we missed. Michael, can you arrange for that?
Flags: needinfo?(mvines)
Reporter | ||
Comment 51•10 years ago
|
||
I just checked our automation and this test is now green again over here so looks like we're done. Thanks!
Status: ASSIGNED → RESOLVED
Closed: 10 years ago
Flags: needinfo?(mvines)
Resolution: --- → FIXED
Updated•10 years ago
|
Updated•10 years ago
|
Flags: in-moztrap?(ychung)
Updated•10 years ago
|
Flags: in-moztrap?(ychung) → in-moztrap?(rmead)
Comment 52•10 years ago
|
||
No STR is present to create test case to address bug.
QA Whiteboard: [QAnalyst-Triage?]
Flags: needinfo?(ktucker)
Updated•10 years ago
|
QA Whiteboard: [QAnalyst-Triage?] → [QAnalyst-Triage+]
Flags: needinfo?(ktucker)
Flags: in-moztrap?(rmead)
Flags: in-moztrap-
Reporter | ||
Updated•10 years ago
|
You need to log in
before you can comment on or make changes to this bug.
Description
•