Closed
Bug 1245241
Opened 9 years ago
Closed 9 years ago
(e10s) Crash when loading https://treeherder.mozilla.org/#/jobs?repo=mozilla-central
Categories
(Core :: Graphics, defect)
Core
Graphics
Tracking
()
RESOLVED
FIXED
mozilla47
People
(Reporter: gerard-majax, Assigned: lsalzman)
References
Details
(Keywords: crash)
Attachments
(7 files, 3 obsolete files)
(deleted),
text/plain
|
Details | |
(deleted),
patch
|
billm
:
review+
|
Details | Diff | Splinter Review |
(deleted),
patch
|
billm
:
review+
|
Details | Diff | Splinter Review |
(deleted),
patch
|
billm
:
review+
|
Details | Diff | Splinter Review |
(deleted),
patch
|
nical
:
review+
|
Details | Diff | Splinter Review |
(deleted),
patch
|
lsalzman
:
review+
|
Details | Diff | Splinter Review |
(deleted),
patch
|
lsalzman
:
review+
|
Details | Diff | Splinter Review |
Since a week at least, I have constantly reproducible crash when loading treeherder page.
See crash reports:
https://crash-stats.mozilla.com/report/list?product=Firefox&signature=%400x40c90c
https://crash-stats.mozilla.com/report/list?product=Firefox&signature=%400x40c91c
https://crash-stats.mozilla.com/report/index/a34f3b01-df10-4f04-9a11-dbdf62160201
https://crash-stats.mozilla.com/report/list?product=Firefox&signature=mozilla%3A%3Anet%3A%3AnsHttpConnectionMgr%3A%3AnsHalfOpenSocket%3A%3AOnOutputStreamReady
Reporter | ||
Comment 1•9 years ago
|
||
This is 100% repro on my profile with uptodate nightlies. A debug build with a clean profile shows no crash.
Reporter | ||
Comment 2•9 years ago
|
||
Disabling all the addons on my profile does not help.
Reporter | ||
Comment 3•9 years ago
|
||
Unable to reproduce on my profile when running Nightly (from mozilla.org) under GDB :/
Reporter | ||
Comment 4•9 years ago
|
||
(In reply to Alexandre LISSY :gerard-majax from comment #1)
> This is 100% repro on my profile with uptodate nightlies. A debug build with
> a clean profile shows no crash.
Ok, the difference is loading mozilla-inbound (does not crash) VS mozilla-central (crash). When loading mozilla-central with a local debug build, crash gets reproduced. When loading inbound with nightly release and my profile, no problem.
Reporter | ||
Updated•9 years ago
|
Summary: Crash when loading treeherder.mozilla.org → Crash when loading https://treeherder.mozilla.org/#/jobs?repo=mozilla-central
Reporter | ||
Comment 5•9 years ago
|
||
My local debug build has no symbol so for now the gdb backtrace is meaningless.
Reporter | ||
Comment 6•9 years ago
|
||
Flags: needinfo?(nical.bugzilla)
Reporter | ||
Comment 7•9 years ago
|
||
This is on a Ubuntu 15.10, and nothing has changed.
Reporter | ||
Comment 8•9 years ago
|
||
Looks like this is not reproduced in a non e10s window
Summary: Crash when loading https://treeherder.mozilla.org/#/jobs?repo=mozilla-central → (e10s) Crash when loading https://treeherder.mozilla.org/#/jobs?repo=mozilla-central
Comment 9•9 years ago
|
||
That's a failure to allocate shared memory on the client side. Usually this means we're out of memory. Treeherder is (in my experience) pretty heavy on both JS and gfx so it's not too surprising that we hit the limits on this page more often than on other pages if we are already close to the edge (which might not be the case, though).
Flags: needinfo?(nical.bugzilla)
Reporter | ||
Comment 10•9 years ago
|
||
With 20GB RAM (at least 12GB-13GB free when testing) plus 16GB of swap (mostly unused) ?
Milan, we're seeing a lot of people crashing pretty reliably on treeherder lately. Bug 1245679 is another issue. Can you get someone to look into this? It's hitting a lot of Mozilla devs.
Flags: needinfo?(milan)
Will do, but we're thin on the linux side, with some 45 blockers in the pipeline, and :nical away until 2/15. This is linux only, right?
Flags: needinfo?(nical.bugzilla)
Updated•9 years ago
|
tracking-e10s:
--- → ?
Updated•9 years ago
|
I can reproduce this pretty easily by doing |ulimit -n 300| and then loading treeherder in a new Firefox session. I'm not sure why other people are getting up to 1000, but maybe my computer is just a little faster or something.
So, is ulimit -n 2000 a workaround?
Flags: needinfo?(milan)
Comment 15•9 years ago
|
||
(In reply to Milan Sreckovic [:milan] from comment #14)
> So, is ulimit -n 2000 a workaround?
Not really. I found on my machine 4096 wasn't high enough; su'ing to root, bumping to 32768, then su back to me and running avoids it. The spike can be very large, depending heavily on timing/loading/phase-of-the-moon. And we can't ask people to bump the fd limit just to run Firefox of course.
rr in normal mode doesn't crash. rr using the 'chaos' branch did crash for me. Even gdb with a breakpoint with an "ignore N 500" on shmem allocation stopped it from failing.
Updated•9 years ago
|
Component: General → Graphics
Product: Firefox → Core
Comment 16•9 years ago
|
||
I've found that with 1024 on my system, the beta repo in treeherder is almost insta-crash (in a local linux64 opt build)
This appears to have been regressed by bug 1180942. Milan, can we back that out until Nical returns?
Also, Randell, can you confirm that setting layers.use-image-offscreen-surfaces to false makes the problem go away?
Flags: needinfo?(rjesup)
Flags: needinfo?(milan)
Comment 20•9 years ago
|
||
It seems like we should be able to just close the files after mapping them to avoid hitting the limit.
Comment 21•9 years ago
|
||
(In reply to Bill McCloskey (:billm) from comment #17)
> This appears to have been regressed by bug 1180942. Milan, can we back that
> out until Nical returns?
>
> Also, Randell, can you confirm that setting
> layers.use-image-offscreen-surfaces to false makes the problem go away?
Yes - instacrash with it true, no crash with it false.
Flags: needinfo?(rjesup)
Reporter | ||
Comment 22•9 years ago
|
||
(In reply to Bill McCloskey (:billm) from comment #17)
> This appears to have been regressed by bug 1180942. Milan, can we back that
> out until Nical returns?
>
> Also, Randell, can you confirm that setting
> layers.use-image-offscreen-surfaces to false makes the problem go away?
I don't see that pref being defined at all on my system. Also, toggling gfx.xrender.enabled to false still crashes on treeherder mozilla-central.
Comment 23•9 years ago
|
||
> > Also, Randell, can you confirm that setting
> > layers.use-image-offscreen-surfaces to false makes the problem go away?
>
> I don't see that pref being defined at all on my system. Also, toggling
> gfx.xrender.enabled to false still crashes on treeherder mozilla-central.
You have to use New->boolean in about:config. I don't crash with the layers pref set; I'm not sure the gfx pref has the same effect.
Sounds like we should re-enable xrender, disable layers.use-image-offscreen-surfaces, reopen relevant bugs (bug 1180942, etc.) until we sort this out. Lee or Jeff, can you do this?
Flags: needinfo?(milan)
Flags: needinfo?(lsalzman)
Flags: needinfo?(jmuizelaar)
Comment 25•9 years ago
|
||
It looks like we're close to improving the situation properly and layers.use-image-offscreen-surfaces has been true since Thu Jan 21 so hopefully the problem isn't too widespread. If we can't find a fix by the end of the week we'll revert the change.
Flags: needinfo?(jmuizelaar)
(In reply to Jeff Muizelaar [:jrmuizel] from comment #25)
> It looks like we're close to improving the situation properly
How so? Is there a different bug?
Comment 27•9 years ago
|
||
Lee has a patch to close the file descriptor for a mapping after we map:
https://hg.mozilla.org/try/rev/a8cf60d1410b
It seems to solve the problem but we're running into a different problem with it on try.
Assignee | ||
Comment 28•9 years ago
|
||
The main side-effect of this patch is that it closes fds after mmaping, which allows us to function even with much lower fd limits.
In the process, while trying to figure out the shmem code, it was hard to understand what was going on with the same code duplicated in 4 places (debug vs. non-debug, basic vs. sysv).
So I attempted to remove a lot of the static dispatching and replace it with better use of virtuals so that we it is much easier to have different types of shmems now.
Assignee: nobody → lsalzman
Status: NEW → ASSIGNED
Flags: needinfo?(lsalzman)
Attachment #8719046 -
Flags: review?(wmccloskey)
Assignee | ||
Comment 29•9 years ago
|
||
Fix accidental widget/gtk bug regarding SysV shm.
Attachment #8719046 -
Attachment is obsolete: true
Attachment #8719046 -
Flags: review?(wmccloskey)
Attachment #8719106 -
Flags: review?(wmccloskey)
Comment on attachment 8719106 [details] [diff] [review]
Close Shmem file handles after mapping them when possible to reduce exhaustion issues.
Review of attachment 8719106 [details] [diff] [review]:
-----------------------------------------------------------------
This looks pretty good, but I see two issues:
1. We have an unused set of functions for adopting shared memory. Those will break with this patch, I think, because they depend on the fd remaining open. So I think you should remove the code to support this.
2. Although your new way of handling SYSV shared memory is nicer, we don't actually use SYSV shared memory anymore. So let's remove that too.
::: ipc/chromium/src/base/shared_memory_posix.cc
@@ +257,5 @@
> }
>
>
> +void SharedMemory::Close(bool unmap_view) {
> + if(unmap_view) {
Should be |if (unmap_view)| (need a space).
::: ipc/glue/SharedMemory.h
@@ +57,5 @@
>
> virtual bool Create(size_t size) = 0;
> virtual bool Map(size_t nBytes) = 0;
>
> + virtual void Close() {}
Could this be = 0 instead?
Attachment #8719106 -
Flags: review?(wmccloskey)
Comment 31•9 years ago
|
||
Can we please resolve this? Permacrashing on treeherder is getting annoying.... ;-)
Assignee | ||
Comment 32•9 years ago
|
||
Fixed the formatting error, made Close pure virtual, and renamed it to CloseHandle for clarity's sake.
Attachment #8719106 -
Attachment is obsolete: true
Attachment #8720384 -
Flags: review?(wmccloskey)
Assignee | ||
Comment 33•9 years ago
|
||
This removes the ability to create TYPE_SYSV Shmems with IPDL and associated tests.
SharedMemorySysV is still used by widget/gtk but that is removed in a further patch.
Attachment #8720385 -
Flags: review?(wmccloskey)
Assignee | ||
Comment 34•9 years ago
|
||
We're not actually using AdoptShmem anywhere at all, and it becomes trickier to use as a result of this file handle trickery... so here we just remove it instead, as requested.
Attachment #8720386 -
Flags: review?(wmccloskey)
Assignee | ||
Comment 35•9 years ago
|
||
Since SharedMemorySysV is no longer used in IPDL or anywhere else in ipc, besides its sole use in nsShmImage for widget/gtk, the tiny few lines of code relating to actually setting up sysv shm segments should just be moved into nsShmImage itself. This way we can just totally remove SharedMemorySysV.
Attachment #8720428 -
Flags: review?(nical.bugzilla)
Attachment #8720386 -
Flags: review?(wmccloskey) → review+
Attachment #8720384 -
Flags: review?(wmccloskey) → review+
Attachment #8720385 -
Flags: review?(wmccloskey) → review+
Updated•9 years ago
|
Attachment #8720428 -
Flags: review?(nical.bugzilla) → review+
Assignee | ||
Comment 36•9 years ago
|
||
Try run: https://treeherder.mozilla.org/#/jobs?repo=try&revision=7a42e2bd00c6
The pre-existing mochitest gl perma-orange is not from this change.
Comment 37•9 years ago
|
||
Comment 38•9 years ago
|
||
bugherder |
https://hg.mozilla.org/mozilla-central/rev/da603bc85e3f
https://hg.mozilla.org/mozilla-central/rev/7c656da2e681
https://hg.mozilla.org/mozilla-central/rev/48fa8a92f936
https://hg.mozilla.org/mozilla-central/rev/94f9bb9e3253
Status: ASSIGNED → RESOLVED
Closed: 9 years ago
status-firefox47:
--- → fixed
Resolution: --- → FIXED
Target Milestone: --- → mozilla47
Reporter | ||
Comment 40•9 years ago
|
||
Nightly with https://hg.mozilla.org/mozilla-central/rev/69ec3dc408a2a720cb2b8210fea33e3504aeec22 (which seems to contain the patches from this bug) and I am still experiencing crashes.
Flags: needinfo?(lsalzman)
Reporter | ||
Comment 41•9 years ago
|
||
(In reply to Alexandre LISSY :gerard-majax from comment #40)
> Nightly with
> https://hg.mozilla.org/mozilla-central/rev/
> 69ec3dc408a2a720cb2b8210fea33e3504aeec22 (which seems to contain the patches
> from this bug) and I am still experiencing crashes.
Crash report: bp-3d8c199d-c418-4e82-82b1-64c852160220
Reporter | ||
Comment 42•9 years ago
|
||
Nightly built from https://hg.mozilla.org/mozilla-central/rev/af6356a3e8c56036b74ba097395356d9c6e6c5a3, still crashing.
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Reporter | ||
Comment 43•9 years ago
|
||
(In reply to Alexandre LISSY :gerard-majax from comment #42)
> Nightly built from
> https://hg.mozilla.org/mozilla-central/rev/
> af6356a3e8c56036b74ba097395356d9c6e6c5a3, still crashing.
Sadly, I do not get any tab crash report dialog to submit anything.
Comment 44•9 years ago
|
||
(In reply to Alexandre LISSY :gerard-majax from comment #43)
> (In reply to Alexandre LISSY :gerard-majax from comment #42)
> > Nightly built from
> > https://hg.mozilla.org/mozilla-central/rev/
> > af6356a3e8c56036b74ba097395356d9c6e6c5a3, still crashing.
>
> Sadly, I do not get any tab crash report dialog to submit anything.
Was this a local build? If so can you attach gdb and get a stack? If it's not a local build can you file a bug about not getting a tab crash dialog?
Flags: needinfo?(lissyx+mozillians)
Reporter | ||
Comment 46•9 years ago
|
||
Just to be clear, the tabs turns to "Hm that tab crashed" but I don't get the UI to send a report.
Flags: needinfo?(jmuizelaar)
Comment 47•9 years ago
|
||
(In reply to Alexandre LISSY :gerard-majax from comment #46)
> Just to be clear, the tabs turns to "Hm that tab crashed" but I don't get
> the UI to send a report.
That still sounds like a bug to me.
Flags: needinfo?(jmuizelaar)
Reporter | ||
Comment 48•9 years ago
|
||
Filed as bug 1249995
Updated•9 years ago
|
Assignee | ||
Comment 49•9 years ago
|
||
Flags: needinfo?(lsalzman)
Attachment #8722108 -
Flags: review?(wmccloskey)
Attachment #8722108 -
Flags: review?(wmccloskey) → review+
Assignee | ||
Comment 50•9 years ago
|
||
Small fix to check against -1 instead of 0.
Attachment #8722108 -
Attachment is obsolete: true
Attachment #8722127 -
Flags: review+
Comment 51•9 years ago
|
||
Comment 52•9 years ago
|
||
Lee's going to try to turn off xrender to prevent further suffering while we try to figure this out.
We're still running out of fds. I can crash pretty repeatedly by zooming out a lot and then loading mozilla-central in treeherder. If I bump the fd limit to 60000, I don't crash.
Here's my guess about what's happening here:
When we load treeherder, we send a lot of shared memory segments from the child process to the parent. The parent first reads these messages on the I/O thread and then passes them to the compositor thread. When loading treeherder, the compositor thread seems to be really busy doing stuff in Cairo. I don't know what it's doing, but the main thread of the chrome process is locked for several seconds in SendUpdate waiting for the compositor.
File descriptors are initially used up when messages are read on the I/O thread. They don't get closed until the message is passed to the compositor thread (which is when ReadSegment will be called). So this is basically a case where the content process and the I/O thread get ahead of the compositor thread.
It would really help me if someone could explain why we're quickly creating and then destroying so many textures. My understanding is that the textures for layers live as long as the layer. Are these layers getting immediately destroyed, or is this a different sort of texture?
Comment 55•9 years ago
|
||
bugherder |
Status: REOPENED → RESOLVED
Closed: 9 years ago → 9 years ago
Resolution: --- → FIXED
Comment 56•9 years ago
|
||
My understanding is that we are still having issues on treeherder with this patch. I am hacking something that will throttle shmem creation passed a certain number by sending a sync message to the compositor. It should fix the remaining issues, and if it doesn't work we'll reenable xrender.
Status: RESOLVED → REOPENED
Flags: needinfo?(nical.bugzilla)
Resolution: FIXED → ---
Comment 57•9 years ago
|
||
This fixes the issue for me locally. It makes sure we don't allocate more than 256 shmems within a layers transaction without at least waiting for the compositor, and spits out a performance warning when the limit is hit.
Attachment #8722566 -
Flags: review?(lsalzman)
Assignee | ||
Updated•9 years ago
|
Attachment #8722566 -
Flags: review?(lsalzman) → review+
Comment 58•9 years ago
|
||
Comment on attachment 8722566 [details] [diff] [review]
Force the main thread to sync with the compsoitor when it tries to allocate silly amounts of shmems
Review of attachment 8722566 [details] [diff] [review]:
-----------------------------------------------------------------
::: gfx/layers/ipc/ShadowLayers.cpp
@@ +49,5 @@
> +
> +static void ShmemAllocated(LayerTransactionChild* aProtocol)
> +{
> + sShmemCreationCounter++;
> + if (sShmemCreationCounter > 256) {
As the dust settles, if this stays around, make it a pref?
@@ +51,5 @@
> +{
> + sShmemCreationCounter++;
> + if (sShmemCreationCounter > 256) {
> + aProtocol->SendSyncWithCompositor();
> + ResetShmemCounter();
Should we reset the counter before or after making this sync call? I guess it doesn't really matter.
Also (in a followup bug), we may want to consider reporting this beyond the performance warning, perhaps in telemetry.
Assignee | ||
Comment 61•9 years ago
|
||
(In reply to Pulsebot from comment #58)
> https://hg.mozilla.org/integration/mozilla-inbound/rev/6cb44006032d
Bill, can you see if this resolves the issue for you?
Flags: needinfo?(wmccloskey)
It's not crashing for me with nical's patch!
Flags: needinfo?(wmccloskey)
Comment 63•9 years ago
|
||
bugherder |
Status: REOPENED → RESOLVED
Closed: 9 years ago → 9 years ago
Resolution: --- → FIXED
Reporter | ||
Comment 64•9 years ago
|
||
Not crashing anymore with nightly from feb 25th.
Updated•7 years ago
|
You need to log in
before you can comment on or make changes to this bug.
Description
•