Closed
Bug 1384701
Opened 7 years ago
Closed 7 years ago
Stylo: Linux debug parallel reftest / mochitest runs crash on shutdown
Categories
(Core :: CSS Parsing and Computation, enhancement, P2)
Core
CSS Parsing and Computation
Tracking
()
RESOLVED
FIXED
mozilla57
Tracking | Status | |
---|---|---|
firefox57 | --- | fixed |
People
(Reporter: jryans, Assigned: karlt)
References
Details
Attachments
(3 files)
See all 8 crashing reftest runs here:
https://treeherder.mozilla.org/#/jobs?repo=try&revision=6813000f2a777de4ed8d410522ef9dc4bf8f6de5&filter-searchStr=test-linux64-stylo%2Fdebug%20tc-R(
An example log at the crash:
https://treeherder.mozilla.org/logviewer.html#?job_id=117730537&repo=try&lineNumber=39514
The tests themselves seem to pass, but then we crash on shutdown, perhaps because of the assertion "firefox: ../../../../src/cairo-hash.c:217: _cairo_hash_table_destroy: Assertion `hash_table->live_entries == 0' failed."
Reporter | ||
Comment 1•7 years ago
|
||
:bholley, this is blocking the work to test on more desktop platforms. We haven't been testing the configuration of Stylo + e10s off + parallel traversal correctly, which I ran into while working on bug 1380053.
What's the right way to proceed here? Do we want to fix it even though it's e10s off only? If so, who is best placed to do so? Or else should we disable this test configuration?
Reporter | ||
Updated•7 years ago
|
Flags: needinfo?(bobbyholley)
Reporter | ||
Comment 2•7 years ago
|
||
I'll try disabling these runs for now to unblock the other platforms at least.
Reporter | ||
Comment 3•7 years ago
|
||
Mochitest style runs are also affected.
Summary: Stylo: Linux64 debug e10s-off parallel reftest runs crash on shutdown → Stylo: Linux64 debug e10s-off parallel reftest / mochitest runs crash on shutdown
Reporter | ||
Comment 4•7 years ago
|
||
Looks like the crashing thread goes through cairo, which then tries to call "firefox!arena_dalloc [mozjemalloc.cpp:6813000f2a77 : 1115 + 0x5]" which appears to fail.
:bholley and I discussed on IRC that we'll disable this configuration for now to unblock other platforms while we investigate a fix here.
:glandium, any ideas about the cairo / jemalloc issue here? It only appears to happen for e10s-off + parallel Stylo + debug runs on Linux64, so necessary options include:
STYLO_THREADS=4
STYLO_FORCE_ENABLED=1
--disable-e10s
(debug build)
Flags: needinfo?(bobbyholley) → needinfo?(mh+mozilla)
Reporter | ||
Comment 5•7 years ago
|
||
Just to emphasize it: we aren't testing this correctly on try currently, because we were not setting STYLO_THREADS to 4 when e10s was off. So we'll need to double-check the try runs when testing any fix to be sure we've covered the right thing here.
Updated•7 years ago
|
Priority: -- → P2
Reporter | ||
Comment 6•7 years ago
|
||
Now that bug 1380053 has landed (which disabled the failing configs here), you can re-enable them for testing on try by searching tests.yml[1] for the lines mentioning this bug and removing them so that e10s-off runs as well.
[1]: https://hg.mozilla.org/mozilla-central/diff/30fb5403e45c/taskcluster/ci/test/tests.yml
Updated•7 years ago
|
Priority: P2 → --
Reporter | ||
Updated•7 years ago
|
Priority: -- → P4
Reporter | ||
Comment 7•7 years ago
|
||
The same issue also seems to occur in a parallel _e10s on_ run of the DevTools test devtools/client/responsive.html/test/browser/browser_window_close.js.
Comment 8•7 years ago
|
||
After some painful experience with a one-click loaner (most of which revolves around bug 1386946), I managed to get a gdbserver attached to the crashing firefox, which just showed that our crash reporter can't really handle to get a good stack trace from system libraries, even when the debug packages are installed (incidentally, libc's was, but not libcairo's), and the trace in the logs is just quoting irrelevant things.
Here the stack trace from gdb's perspective:
Thread 1 "firefox" received signal SIGABRT, Aborted.
0x00007ffff6b79428 in __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:54
54 ../sysdeps/unix/sysv/linux/raise.c: No such file or directory.
(gdb) bt
#0 0x00007ffff6b79428 in __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:54
#1 0x00007ffff6b7b02a in __GI_abort () at abort.c:89
#2 0x00007ffff6b71bd7 in __assert_fail_base (fmt=<optimized out>,
assertion=assertion@entry=0x7ffff343eaad "hash_table->live_entries == 0",
file=file@entry=0x7ffff343ea90 "../../../../src/cairo-hash.c", line=line@entry=217,
function=function@entry=0x7ffff343ec70 <__PRETTY_FUNCTION__.10447> "_cairo_hash_table_destroy")
at assert.c:92
#3 0x00007ffff6b71c82 in __GI___assert_fail (
assertion=assertion@entry=0x7ffff343eaad "hash_table->live_entries == 0",
file=file@entry=0x7ffff343ea90 "../../../../src/cairo-hash.c", line=line@entry=217,
function=function@entry=0x7ffff343ec70 <__PRETTY_FUNCTION__.10447> "_cairo_hash_table_destroy")
at assert.c:101
#4 0x00007ffff33975bc in _cairo_hash_table_destroy (hash_table=<optimized out>)
at ../../../../src/cairo-hash.c:217
#5 0x00007ffff33cd697 in _cairo_scaled_font_map_destroy ()
at ../../../../src/cairo-scaled-font.c:444
#6 0x00007ffff338ed69 in cairo_debug_reset_static_data () at ../../../../src/cairo-debug.c:67
#7 0x00007fffe8a563d7 in MOZ_gdk_display_close(_GdkDisplay*) ()
from target:/home/worker/workspace/build/application/firefox/libxul.so
#8 0x00007fffe8a5e90c in XREMain::XRE_main(int, char**, mozilla::BootstrapConfig const&) ()
from target:/home/worker/workspace/build/application/firefox/libxul.so
#9 0x00007fffe8a5ea80 in XRE_main(int, char**, mozilla::BootstrapConfig const&) ()
from target:/home/worker/workspace/build/application/firefox/libxul.so
#10 0x00000000004064b6 in do_main(int, char**, char**) ()
#11 0x0000000000405c5c in main ()
Which says two things:
- yes, the assertion "firefox: ../../../../src/cairo-hash.c:217: _cairo_hash_table_destroy: Assertion `hash_table->live_entries == 0' failed." is what's causing it
- no, jemalloc has nothing to do with it.
Flags: needinfo?(mh+mozilla)
Comment 9•7 years ago
|
||
I see this assertion all the time locally with --disable-e10s on linux64.
Comment 10•7 years ago
|
||
See also bug 1387490.
Reporter | ||
Updated•7 years ago
|
Comment 12•7 years ago
|
||
Jonathan, do you know what the cause of this assertion (_cairo_scaled_font_map_destroy is not empty) generally is? Is it leaking a gfxFontEntry?
Flags: needinfo?(jfkthame)
Comment 13•7 years ago
|
||
Given this blocks more mochitests, it should pretty much have a higher priority.
Priority: P4 → P2
Comment 14•7 years ago
|
||
(In reply to Cameron McCormack (:heycam) from comment #12)
> Jonathan, do you know what the cause of this assertion
> (_cairo_scaled_font_map_destroy is not empty) generally is? Is it leaking a
> gfxFontEntry?
Yes, that seems likely. It might be interesting to catch this in a debugger (rr?) and see exactly what entry is left there during shutdown.
(I tried to reproduce this and bug 1387490 in a local linux64 build with stylo, but so far I'm finding them elusive here...)
Flags: needinfo?(jfkthame)
Comment 15•7 years ago
|
||
Are you passing --disable-e10s? That makes it happen consistently for me.
Flags: needinfo?(jfkthame)
Comment 16•7 years ago
|
||
Yes, I'm using --disable-e10s.
I did get the assertion once after loading the testcase from bug 1387490 and then quitting the browser, but then when I tried to repeat that in order to record/debug it, I couldn't get it to happen again. I'm guessing something racy may be going on, and my particular system has a hard time getting the timing just right to catch it.
Flags: needinfo?(jfkthame)
Updated•7 years ago
|
Summary: Stylo: Linux64 debug e10s-off parallel reftest / mochitest runs crash on shutdown → Stylo: Linux64 debug parallel reftest / mochitest runs crash on shutdown
Comment hidden (obsolete) |
Comment 19•7 years ago
|
||
Comment 20•7 years ago
|
||
(In reply to Jonathan Kew (:jfkthame) from comment #16)
> Yes, I'm using --disable-e10s.
>
> I did get the assertion once after loading the testcase from bug 1387490 and
> then quitting the browser, but then when I tried to repeat that in order to
> record/debug it, I couldn't get it to happen again. I'm guessing something
> racy may be going on, and my particular system has a hard time getting the
> timing just right to catch it.
You may want to clear all cache before trying again probably?
Also you can probably try applying the patch in bug 1369815 then executing "mach mochitest layout/style/test/chrome". This is a reasonably small test set, which, according to that bug, can reproduce this issue without stylo.
Flags: needinfo?(jfkthame)
Reporter | ||
Comment 21•7 years ago
|
||
This also appears to affect Linux 32-bit Stylo runs:
https://treeherder.mozilla.org/#/jobs?repo=try&revision=4225df6913ab00cb97bcdb17b8c29a24ee52673b
(Ignore the browser chrome part, that seems unrelated...) Mochitest chrome, a11y, and clipboard seem to trigger this issue on Linux 32-bit with Stylo enabled.
Summary: Stylo: Linux64 debug parallel reftest / mochitest runs crash on shutdown → Stylo: Linux debug parallel reftest / mochitest runs crash on shutdown
Comment 22•7 years ago
|
||
So I did several try push yesterday with the assertion removed... but those patch doesn't seem to work.
It seems on Linux, we use the system cairo rather than our in-tree one, and it seems we don't switch to the in-tree one even if I make MOZ_TREE_CAIRO set unconditionally in old-configure.in. I have no idea why...
Comment 23•7 years ago
|
||
We *are* using the in-tree cairo. However, *gtk* is using system cairo, and there's no way around that.
Comment 24•7 years ago
|
||
So, not all tests trigger this assertion.
I've figured out that accessible/tests/mochitest/test_aria_token_attrs.html is one test which can trigger this shutdown assertion in a11y.
Given the testcase in bug 1387490, I suspect this is related to the <input>s in this test as well.
Comment 25•7 years ago
|
||
I'm still finding this very hard to reproduce locally with any consistency.
I did catch it in rr once (after loading the testcase from bug 1387490, then quitting the browser), and poked around enough to confirm that we had one stray gfxFont instance still alive with refcnt=1. This was an instance of the DejaVu Serif font, which was created early in the run (it was the second gfxFont instance to be created).
Exploring this led me to notice bug 1393185, which was causing a lot of refcounting noise, though I don't believe it is directly connected to the eventual assertion here. But having touched my build there, I can no longer replay that recording in order to investigate further; and now I can't seem to trigger the assertion again to get a fresh recording. :(
Flags: needinfo?(jfkthame)
Comment 26•7 years ago
|
||
(In reply to Xidorn Quan [:xidorn] UTC+10 (PTO, back Aug 28) from comment #24)
> I've figured out that accessible/tests/mochitest/test_aria_token_attrs.html
> is one test which can trigger this shutdown assertion in a11y.
I got this test to trigger the cairo hashtable assertion during shutdown, though it's worth noting that it is preceded by a couple of other assertions as well, starting with a LOUD WARNING ABOUT JSRuntime LEAKAGE:
GECKO(9593) | WARNING: YOU ARE LEAKING THE WORLD (at least one JSRuntime and everything alive inside it, that is) AT JS_ShutDown TIME. FIX THIS!
GECKO(9593) | [9593] ###!!! ASSERTION: Component Manager being held past XPCOM shutdown.: 'cnt == 0', file /home/jkew/mozdev/mozilla-central/xpcom/build/XPCOMInit.cpp, line 1040
GECKO(9593) | #01: mozilla::ShutdownXPCOM(nsIServiceManager*) (/home/jkew/mozdev/mozilla-central/xpcom/build/XPCOMInit.cpp:1040)
GECKO(9593) | #02: NS_ShutdownXPCOM (/home/jkew/mozdev/mozilla-central/xpcom/build/XPCOMInit.cpp:825)
GECKO(9593) | #03: ScopedXPCOMStartup::~ScopedXPCOMStartup() (/home/jkew/mozdev/mozilla-central/toolkit/xre/nsAppRunner.cpp:1478)
GECKO(9593) | #04: operator delete(void*) (/home/jkew/mozdev/mozilla-central/obj-x86_64-pc-linux-gnu/dist/include/mozilla/mozalloc.h:224)
GECKO(9593) | #05: XRE_main(int, char**, mozilla::BootstrapConfig const&) (/home/jkew/mozdev/mozilla-central/toolkit/xre/nsAppRunner.cpp:4905)
GECKO(9593) | #06: mozilla::BootstrapImpl::XRE_main(int, char**, mozilla::BootstrapConfig const&) (/home/jkew/mozdev/mozilla-central/toolkit/xre/Bootstrap.cpp:46)
GECKO(9593) | #07: do_main(int, char**, char**) (/home/jkew/mozdev/mozilla-central/browser/app/nsBrowserApp.cpp:237 (discriminator 2))
GECKO(9593) | #08: main (/home/jkew/mozdev/mozilla-central/browser/app/nsBrowserApp.cpp:309)
GECKO(9593) | #09: __libc_start_main (/build/glibc-bfm8X4/glibc-2.23/csu/../csu/libc-start.c:325)
GECKO(9593) | #10: _start (/home/jkew/mozdev/mozilla-central/obj-x86_64-pc-linux-gnu/dist/bin/firefox)
GECKO(9593) | #11: ??? (???:???)
GECKO(9593) | [9593] ###!!! ASSERTION: 2543 dynamic atom(s) with non-zero refcount: New Folder…,inactive,menu_openFile,panel-wide-item,onDOMMenuInactive,context_unpinTab,context_reloadTab,notification-anchor-icon,context-sep-ctp,addon-webext-perm-notification-content,BMB_bookmarksPopup,menu_sendLink,cmd_cut,View:ReaderView,pointerlock-warning,cmd_quitApplication,Email Image…,X,onsizemodechange,messageCloseButton,...: 'nonZeroRefcountAtomsCount == 0', file /home/jkew/mozdev/mozilla-central/xpcom/ds/nsAtom
GECKO(9593) | #01: nsACString::~nsACString() (/home/jkew/mozdev/mozilla-central/obj-x86_64-pc-linux-gnu/dist/include/nsTSubstring.h:347)
GECKO(9593) | #02: mozilla::BaseAutoLock<mozilla::Mutex>::~BaseAutoLock() (/home/jkew/mozdev/mozilla-central/obj-x86_64-pc-linux-gnu/dist/include/mozilla/Mutex.h:171)
GECKO(9593) | #03: mozilla::ShutdownXPCOM(nsIServiceManager*) (/home/jkew/mozdev/mozilla-central/xpcom/build/XPCOMInit.cpp:1050)
GECKO(9593) | #04: NS_ShutdownXPCOM (/home/jkew/mozdev/mozilla-central/xpcom/build/XPCOMInit.cpp:825)
GECKO(9593) | #05: ScopedXPCOMStartup::~ScopedXPCOMStartup() (/home/jkew/mozdev/mozilla-central/toolkit/xre/nsAppRunner.cpp:1478)
GECKO(9593) | #06: operator delete(void*) (/home/jkew/mozdev/mozilla-central/obj-x86_64-pc-linux-gnu/dist/include/mozilla/mozalloc.h:224)
GECKO(9593) | #07: XRE_main(int, char**, mozilla::BootstrapConfig const&) (/home/jkew/mozdev/mozilla-central/toolkit/xre/nsAppRunner.cpp:4905)
GECKO(9593) | #08: mozilla::BootstrapImpl::XRE_main(int, char**, mozilla::BootstrapConfig const&) (/home/jkew/mozdev/mozilla-central/toolkit/xre/Bootstrap.cpp:46)
GECKO(9593) | #09: do_main(int, char**, char**) (/home/jkew/mozdev/mozilla-central/browser/app/nsBrowserApp.cpp:237 (discriminator 2))
GECKO(9593) | #10: main (/home/jkew/mozdev/mozilla-central/browser/app/nsBrowserApp.cpp:309)
GECKO(9593) | #11: __libc_start_main (/build/glibc-bfm8X4/glibc-2.23/csu/../csu/libc-start.c:325)
GECKO(9593) | #12: _start (/home/jkew/mozdev/mozilla-central/obj-x86_64-pc-linux-gnu/dist/bin/firefox)
GECKO(9593) | #13: ??? (???:???)
GECKO(9593) | firefox: ../../../../src/cairo-hash.c:217: _cairo_hash_table_destroy: Assertion `hash_table->live_entries == 0' failed.
TEST-INFO | Main app process: killed by out-of-range signal, number 134
Buffered messages finished
238 ERROR TEST-UNEXPECTED-FAIL | accessible/tests/mochitest/test_aria_token_attrs.html | application terminated with exit code -134
I don't know whether those earlier leak messages have any connection with the cairo-hash issue (due to holding something alive beyond the expected shutdown time) or not.
However, if I run this under rr, the assertions magically go away; despite repeated attempts, it consistently exits cleanly. Which makes investigation harder... and suggests that this may be some kind of timing-dependent shutdown race, and the overhead of rr recording (on my machine, anyhow) is enough to perturb the behavior and prevent it happening.
Comment 27•7 years ago
|
||
It's... mysterious.
So if we run only the specific test on try, we leak the world as well: https://treeherder.mozilla.org/#/jobs?repo=try&revision=d148caee5ca701aa4a7a983caa1d2218cf66b15d&filter-tier=3&selectedJob=125062322
However, if we run several tests, we end up having this assertion without leaking the world: https://treeherder.mozilla.org/#/jobs?repo=try&revision=aebe4461764721d163c8b59870d06f869cf57159&filter-tier=3&selectedJob=125029871
That says, there may actually be two different issues on this...
It is also weird that why this happens only for debug build. If this is an assertion in system cairo (which gtk uses), then it should always assert, or never assert, if the leak happens consistently. If this is timing-dependent, then probably we are just lucky on opt builds that this situation never happens...
Comment 28•7 years ago
|
||
(In reply to Xidorn Quan [:xidorn] UTC+10 (PTO, back Aug 28) from comment #27)
> It is also weird that why this happens only for debug build. If this is an
> assertion in system cairo (which gtk uses), then it should always assert, or
> never assert, if the leak happens consistently. If this is timing-dependent,
> then probably we are just lucky on opt builds that this situation never
> happens...
IIRC, this assertion comes from (somewhere underneath) cairo_debug_reset_static_data, which I don't believe we call at all in release builds.
Comment 29•7 years ago
|
||
Bug 1393585 is another report of what I'd guess is probably the same issue. :jesup reports (on irc) similar inability to reproduce under rr.
Reporter | ||
Updated•7 years ago
|
Comment 30•7 years ago
|
||
I have repro'd it under rr -- add the -h flag when recording (chaos mode). Unfortunately, my kernel has a bug and when I tried to replay the trace, rr failed -- I'll update my kernel/etc and try again.
Flags: needinfo?(jfkthame)
Comment 31•7 years ago
|
||
So, if I skip the whole MOZ_gdk_display_close (via undefining CLEANUP_MEMORY in toolkit/xre/nsAppRunner.cpp), the task would finish successfully without any leak detected. [1]
[1] https://treeherder.mozilla.org/#/jobs?repo=try&revision=18c0d85723fbb3289843068e1409817626da898d&filter-tier=3
Comment 32•7 years ago
|
||
Actually... it doesn't seem to fail anymore! Let me see other tasks.
Comment 34•7 years ago
|
||
FWIW, as I mentioned in bug 1387993 comment 4, this crash goes away for most of the tasks, probably because of bug 1383332 and/or bug 1393632 which make us do fewer parallelism. tc-M(c1) on Linux x64 seems to be the only affected task now. See https://treeherder.mozilla.org/#/jobs?repo=try&revision=183bca8aff06dc698683c94cdea36489e21c40a5&selectedJob=126332600
Comment 35•7 years ago
|
||
And in particular, using a checking from before that bug landed (last week) might improve reproducibility.
Reporter | ||
Updated•7 years ago
|
Reporter | ||
Comment 37•7 years ago
|
||
Bug 1387490 discovered this same issue via fuzzing. It has a small test case that might aid in reproducing, however it was found before the traversal changes that Xidorn mentions in comment 34, so you may still need to go back in time before those traversal changes for the best change at reproduction.
Comment 39•7 years ago
|
||
Bug 1393585 may also have useful STR here.
Reporter | ||
Updated•7 years ago
|
Comment 40•7 years ago
|
||
So if commenting out the assertion shows that we don't get any of our own leak reports when the assertion would have happened, that suggests the problem might be related to the memory management of GTK, GDK, or cairo objects.
Have we done anything to verify that any of those objects whose reference-counting is single-threaded are being reference-counted only on the correct threads? It seems like reference-count races could lead to something like this.
Or might there be similar issues related to management of statics, either in our code or in those libraries?
Comment 41•7 years ago
|
||
(In reply to David Baron :dbaron: ⌚️UTC-7 from comment #40)
> So if commenting out the assertion shows that we don't get any of our own
> leak reports when the assertion would have happened, that suggests the
> problem might be related to the memory management of GTK, GDK, or cairo
> objects.
Not so fast... I probably should do a try push again with an earlier commit, because the push in comment 31 doesn't show anything simply because it doesn't fail anymore as I mentioned in comment 32, and I realized what happened after that (in comment 34).
Comment 42•7 years ago
|
||
(In reply to Xidorn Quan [:xidorn] UTC+10 from comment #41)
> Not so fast... I probably should do a try push again with an earlier commit,
> because the push in comment 31 doesn't show anything simply because it
> doesn't fail anymore as I mentioned in comment 32, and I realized what
> happened after that (in comment 34).
Alright, I just did a try push [1] with a m-c commit on Aug 21 which is supposed to still have this problem, and without triggering the assertion, we don't have any other leak, so it probably means the problem is related to other libraries we link to.
[1] https://treeherder.mozilla.org/#/jobs?repo=try&revision=a9564909c81aa6abdacdda26d17a7122bfa9f585&filter-tier=3
Assignee | ||
Comment 43•7 years ago
|
||
(In reply to Xidorn Quan [:xidorn] UTC+10 from comment #18)
> FWIW, this seems to affect non-stylo as well. See bug 1369815 where the test
> change leads to a shutdown assertion similar to bug 1387490.
https://bugzilla.mozilla.org/show_bug.cgi?id=1369815#c35 seems to indicate
that was a PresContext leak, and so is unrelated.
Comment 44•7 years ago
|
||
(In reply to Karl Tomlinson (:karlt) from comment #43)
> (In reply to Xidorn Quan [:xidorn] UTC+10 from comment #18)
> > FWIW, this seems to affect non-stylo as well. See bug 1369815 where the test
> > change leads to a shutdown assertion similar to bug 1387490.
>
> https://bugzilla.mozilla.org/show_bug.cgi?id=1369815#c35 seems to indicate
> that was a PresContext leak, and so is unrelated.
Yep, that one is unrelated.
Comment hidden (obsolete) |
Comment 46•7 years ago
|
||
(In reply to Karl Tomlinson (:karlt) from comment #45)
> I'm yet to reproduce this bug locally.
>
> I've run the build from comment 42, with STYLO_THREADS=4
> STYLO_FORCE_ENABLED=1
> in the environment, against gtk, pango, cairo, and glibc libraries from
> Ubuntu
> 16.04, with Ubuntu's Ambiance theme, and failed to reproduce with steps of
> bug
> 1387490 and bug 1393585. I've tried this with and without rr record -h.
To be clear, the build in comment 42 has the cairo leak checking turned off (with the hopes that it would allow us to reach some gecko leak checking, which didn't happen). So you wouldn't reproduce anything with that build.
>
> I've similarly run caps/tests/mochitest/test_bug995943.xul against a recent
> local build and against a local build based on e16dba457260 (i.e. before
> changes for bug 1383332 and bug 1393632).
>
> I did catch one world leak, but this did not trigger this assertion.
> This was with rr record -h from a local build of e16dba457260, closing the
> browser after loading https://bugzilla.mozilla.org/attachment.cgi?id=8893860.
> e10s was disabled but there was still a child (perhaps for tab previews?):
Yes, I see this sometimes as well. Would be interesting to investigate, but ideally we'd focus on the specific cairo crash.
I just grabbed a build (--enable-debug --disable-optimize --enable-stylo) from a few weeks ago that I had lying around, based on [1]. I was able to reproduce it reliably with:
./mach run --disable-e10s https://en.wikipedia.org/wiki/Barack_Obama --debugger=rr --debugger-args="record -h"
The --disable-e10s is necessary to reproduce. The backtrace isn't that useful because my system cairo doesn't have any debug symbols. Not sure if there's an easy way to install symbols after the fact?
So anyway, I have a recording, and can play human gdbserver over IRC if need be. But hopefully you can reproduce it yourself using the above. Let me know.
[1] https://hg.mozilla.org/mozilla-central/rev/d25db0546c92
Flags: needinfo?(karlt)
Assignee | ||
Comment 47•7 years ago
|
||
Thank you for that.
It seems the key to reproducing reliably is to load an input before the
Nightly Start Page.
STYLO_THREADS=4 STYLO_FORCE_ENABLED=1 ./mach run --disable-e10s 'data:text/html,<input>'
will reproduce reliably with the parent of
https://hg.mozilla.org/mozilla-central/rev/a9f5863c9d48
but not with that change for bug 1393632.
I have a recording and will investigate the cause.
(FWIW the world leak happens even with stylo off.)
Assignee | ||
Updated•7 years ago
|
Flags: needinfo?(karlt)
Assignee | ||
Comment 48•7 years ago
|
||
Flags: needinfo?(jfkthame)
Comment hidden (mozreview-request) |
Comment hidden (mozreview-request) |
Comment hidden (mozreview-request) |
Comment 52•7 years ago
|
||
mozreview-review |
Comment on attachment 8904420 [details]
bug 1384701 remove MOZ_WIDGET_GTK == 2 code from nsLookAndFeel
https://reviewboard.mozilla.org/r/176244/#review181370
(might not be the appropriate person to review this, but it looks ok to me)
Attachment #8904420 -
Flags: review?(manishearth) → review+
Comment 53•7 years ago
|
||
Awesome - thanks for figuring this our Karl!
Comment 54•7 years ago
|
||
mozreview-review |
Comment on attachment 8904421 [details]
bug 1384701 get system font name and size from widget style context instead of GtkSettings
https://reviewboard.mozilla.org/r/176246/#review181384
Attachment #8904421 -
Flags: review?(manishearth) → review+
Comment 55•7 years ago
|
||
mozreview-review |
Comment on attachment 8904422 [details]
bug 1384701 get system fonts in EnsureInit() which is on main thread even with servo
https://reviewboard.mozilla.org/r/176248/#review181392
looks ok to me, but I might not be the best person to review this
::: commit-message-9ae5f:16
(Diff revision 1)
> +
> +A GtkEntry already exists on the main thread, as well as style contexts for
> +most other system fonts, and so it is more efficient to create these on the
> +main thread while the style contexts exist.
> +
> +Doing this also avoids the need for Gecko_nsFont_InitSystem() to hold a global
should the lock be removed?
Attachment #8904422 -
Flags: review?(manishearth) → review+
Comment 56•7 years ago
|
||
Thanks for this! I'm not overly familiar with all this code so you may want to get a font peer to review it, but it looks good from my side!
Assignee | ||
Comment 57•7 years ago
|
||
(In reply to Manish Goregaokar [:manishearth] from comment #56)
> Thanks for this! I'm not overly familiar with all this code so you may want
> to get a font peer to review it, but it looks good from my side!
Thank you for looking these over.
This is ancient code. It's been moved around but hasn't really changed
significantly since changes for bug 176842. The people involved are no longer
around.
I've tested that scaling works as expected with GDK_SCALE=2 in the
environment, and that the style resolves with this in
~/.config/gtk-3.0/gtk.css:
entry {
font-family: serif;
}
dbaron more or less suggested this change in
https://bugzilla.mozilla.org/show_bug.cgi?id=96971#c0
"It would make it much easier (well, I would be able to avoid duplicating a
lot more code) to implement system fonts correctly on GTK, since for GTK we
have to create native widgets and poke at their style information, and I'd
rather not duplicate all the creation/destruction code that already exists in
nsLookAndFeelGTK for the system colors implementation."
I'm happy if someone else would like to review, but I don't think we need to
wait for dbaron to return and review this. GTK3 is different from GTK2
anyway.
Assignee | ||
Comment 58•7 years ago
|
||
mozreview-review-reply |
Comment on attachment 8904422 [details]
bug 1384701 get system fonts in EnsureInit() which is on main thread even with servo
https://reviewboard.mozilla.org/r/176248/#review181392
> should the lock be removed?
My point here was meant to be that, if GTK calls were to continue to be risked
off main thread from nsLookAndFeel::GetFontImpl(), then
Gecko_nsFont_InitSystem() would need to compete with
Gecko_GetLookAndFeelSystemColor() for the same lock.
Each of those functions does hold a lock, but they hold different locks, and
so are not mutually exclusive.
From the perspective of the GTK port, there is no longer a need for any lock
around calls to LookAndFeel::GetFont(), and nsRuleNode::ComputeSystemFont()
also looks thread-safe to me.
However, the WINNT port at least does not look thread-safe [3], and so the
lock can't just be removed. From a quick look at the Mac port, it might be
thread-safe or possible to make it thread-safe, but that requires more
investigation.
[3] http://searchfox.org/mozilla-central/rev/4d8e389498a08668cce9ebf6232cc96be178c3e4/widget/windows/nsLookAndFeel.cpp#713
Comment 59•7 years ago
|
||
Pushed by ktomlinson@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/4166c5ecc5d7
remove MOZ_WIDGET_GTK == 2 code from nsLookAndFeel r=manishearth
https://hg.mozilla.org/integration/autoland/rev/143ef903d8f6
get system font name and size from widget style context instead of GtkSettings r=manishearth
https://hg.mozilla.org/integration/autoland/rev/e711937ff6fa
get system fonts in EnsureInit() which is on main thread even with servo r=manishearth
Comment 60•7 years ago
|
||
(In reply to Pulsebot from comment #59)
> Pushed by ktomlinson@mozilla.com:
> https://hg.mozilla.org/integration/autoland/rev/4166c5ecc5d7
> remove MOZ_WIDGET_GTK == 2 code from nsLookAndFeel r=manishearth
What does this have to do with this bug? Wasn't that still kept for redhat?
Updated•7 years ago
|
Flags: needinfo?(karlt)
Assignee | ||
Comment 61•7 years ago
|
||
(In reply to Mike Hommey [:glandium] from comment #60)
> What does this have to do with this bug?
The file modified here is hard to understand with all the GTK2 #ifs.
It is much easier to understand the associated code portions and changes here without the many #ifs.
Maintaining support for GTK2 in the system fonts code modified to address this
bug would require more #ifs.
> Wasn't that still kept for redhat?
https://bugzilla.mozilla.org/show_bug.cgi?id=1278282#c7 indicated that RedHat
no longer need this.
Flags: needinfo?(karlt)
Assignee | ||
Updated•7 years ago
|
Comment 62•7 years ago
|
||
bugherder |
https://hg.mozilla.org/mozilla-central/rev/4166c5ecc5d7
https://hg.mozilla.org/mozilla-central/rev/143ef903d8f6
https://hg.mozilla.org/mozilla-central/rev/e711937ff6fa
Status: ASSIGNED → RESOLVED
Closed: 7 years ago
status-firefox57:
--- → fixed
Resolution: --- → FIXED
Target Milestone: --- → mozilla57
You need to log in
before you can comment on or make changes to this bug.
Description
•