Firefox window randomly freezes
Categories
(Core :: Widget, defect, P3)
Tracking
()
People
(Reporter: nuromi, Assigned: mstange)
References
(Regression)
Details
(Keywords: regression)
Attachments
(10 files)
(deleted),
text/plain
|
Details | |
(deleted),
video/mp4
|
Details | |
(deleted),
image/jpeg
|
Details | |
(deleted),
text/plain
|
Details | |
(deleted),
text/plain
|
Details | |
(deleted),
text/plain
|
Details | |
(deleted),
text/plain
|
Details | |
(deleted),
text/plain
|
Details | |
(deleted),
text/plain
|
Details | |
(deleted),
text/x-phabricator-request
|
diannaS
:
approval-mozilla-beta+
dmeehan
:
approval-mozilla-release-
RyanVM
:
approval-mozilla-esr102+
|
Details |
User Agent: Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101 Firefox/102.0
Steps to reproduce:
Surf the web.
Actual results:
Firefox window randomly freeze. Just the window freeze, Firefox itself kept working.
If I interact with the window (scrolling, click a link, open a new tab, etc.) and then minimize and open the window again, the changes are reflected in the window (although is still freeze). This happens until I close and reopen Firefox.
Expected results:
Firefox window does not freeze.
OS: Debian 11 with Xfce 4.16
Firefox 102.0.1 from Mozilla binaries
Comment 1•2 years ago
|
||
Please see https://support.mozilla.org/kb/firefox-hangs-or-not-responding and report back
Comment 2•2 years ago
|
||
The Bugbug bot thinks this bug should belong to the 'Core::Widget: Gtk' component, and is moving the bug to that component. Please correct in case you think the bot is wrong.
Comment 3•2 years ago
|
||
I could not reproduce the issue on Ubuntu 20.4 using build 102.0.1(20220705093820).
Can you please provide the web that is freezing? Does the problem still happen if you start Firefox in Safe Mode? (Safe Mode disables add-ons, extensions and themes, hardware acceleration and some JavaScript stuff in order to exclude some possible reasons for problems.) See https://support.mozilla.org/en-US/kb/troubleshoot-firefox-issues-using-safe-mode
And does this also happen with a new and empty profile? See https://support.mozilla.org/en-US/kb/troubleshoot-and-diagnose-firefox-problems#w_6-create-a-new-firefox-profile .
(In reply to Andre Klapper from comment #1)
Please see https://support.mozilla.org/kb/firefox-hangs-or-not-responding and report back
I've been trying the solutions there but no luck yet
(In reply to Monica Chiorean from comment #3)
Can you please provide the web that is freezing?
It can happen in any webpage and in any moment.
Does the problem still happen if you start Firefox in Safe Mode? (Safe Mode disables add-ons, extensions and themes, hardware acceleration and some JavaScript stuff in order to exclude some possible reasons for problems.) See https://support.mozilla.org/en-US/kb/troubleshoot-firefox-issues-using-safe-mode
And does this also happen with a new and empty profile? See https://support.mozilla.org/en-US/kb/troubleshoot-and-diagnose-firefox-problems#w_6-create-a-new-firefox-profile .
I'm going to try to test that, but since the freezes happen very randomly (can happen 1 a week or 3 in a day) I don't know how long it would take me.
I followed this guide https://udn.realityripple.com/docs/Mozilla/How_to_report_a_hung_Firefox and did a couple of crash reports in case they help:
https://crash-stats.mozilla.org/report/index/2c9fa42b-5fdb-4a07-ab1d-0746b0220719
https://crash-stats.mozilla.org/report/index/7c90c5e0-b19c-44d9-974a-9a3d20220729
I found this reddit post from someone with the same problem as me
https://www.reddit.com/r/firefox/comments/weqzwm/firefox_suddenly_freezes_on_certain_sites_linux/
Still happen in a newly created profile ( with new .mozilla folder) with default settings and no addons.
Although it took a week and a half to happen again.
I have recorded a video of the bug.
Reporter | ||
Comment 10•2 years ago
|
||
Hello? Is there someone here?
I don't know what more to do, so I accept suggestions.
Comment 11•2 years ago
|
||
I and a number of others running Linux Mint have run into the same issue as described in this topic: https://forums.linuxmint.com/viewtopic.php?f=47&t=376770
A change introduced in FF102 is at the root of our issue. I have a Timeshift snapshot of my system with FF101.0.1 and a copy of my profile with FF101.1 so I can revert to FF101.0.1 and work without freezes. With FF102 and newer, using a fresh profile, running in safe mode, having hardware acceleration on or off, etc. makes no difference. It still freezes. It sort of gives the impression of a possible race condition or a stuck memory situation?
I do want to thank nuromi for mentioning minimizing and then maximizing again gets FF to change because that has been helpful to me to see other tabs I have open before I have to close FF to clear the problem. I usually just switch to another application rather than minimizing apps.
In my case, I normally have multiple tabs open when the freeze occurs, although the number of tabs and the length of time I have had FF open does not seem to correlate to when the problem happens. In FF102 (and 103) when this problem happened if I clicked to change tabs, the title above the tab changed and the tab with focus changed, but the page (below the tab) did not change. Thus I would have a tab from one page in focus with the page from the prior tab still on screen. If I clicked the x to close a tab, the tab might have closed, although usually it did not. If the tab did disappear, the page did not repaint so I had a gap (space) where the tab would have been.
In FF104, if I click a different tab, the tab with focus does not change; only the title at the very top of the page changes to indicate I clicked a different tab. However, if I minimize and then maximize in FF104 after closing a tab, the tab disappears. When these freezes happens, I can not click the + and have a new tab open.
I have tried tracking memory usage (of just firefox.bin) at the time of the freeze, but there does not seem to be a correlation. Sometimes I can go all day before it happens. Other times I am lucky if I make it an hour or two. I am currently running FF104.0.1 and it is still happening.
Comment 12•2 years ago
|
||
If possible, please try to find a regression range, you will get a pushlog url at the end:
$ pip3 install -U mozregression
$ ~/.local/bin/mozregression --good 100 --bad 103
General ideas:
- Test https://nightly.mozilla.org.
- Nvidia
- bug 1737834: video memory leak due to hardware cursor and GBM: https://gitlab.gnome.org/GNOME/mutter/-/issues/2045#note_1332012
- bug 1788573
- bug 1752717
- bug 1743051, bug 1723323:
bug 1751252 should have blocked partial present at least if nvidia is the primary gpu. In case it's possible that Firefox starts on Intel and then uses Nvidia after suspend&resume, partial present might not be blocked.
Open about:config, set gfx.webrender.max-partial-present-rects=0 and gfx.webrender.allow-partial-present-buffer-age=false, restart Firefox and check whether the problem comes back.
- For Intel users who have manually force-enabled hardware rendering:
Remove deprecated Intel DDX driver, use default modesetting driver. bug 1710400 comment 20:
sudo apt remove xserver-xorg-video-intel
- To prevent glxtest crash and fallback to software rendering, remove deprecated libva-vdpau-driver: bug 1787182 comment 2
- Try disabling GLX vsync: Open about:config, set layout.frame_rate=60, restart Firefox.
- XFCE and KDE users should try disabling their compositor (restart Firefox afterwards) and should also check if the same problem occurs with Gnome.
- Try enforcing software rendering: Open about:config, set gfx.webrender.software=true, restart Firefox.
Updated•2 years ago
|
Comment 13•2 years ago
|
||
Also please run Firefox on terminal with MOZ_LOG="Widget:5" and look what happens during the freeze - do Firefox print debug output (receiving events) during the freeze? Does it get keyboard/mouse events?
Thanks.
Comment 14•2 years ago
|
||
(In reply to Martin Stránský [:stransky] (ni? me) from comment #13)
Also please run Firefox on terminal with MOZ_LOG="Widget:5" and look what happens during the freeze - do Firefox print debug output (receiving events) during the freeze? Does it get keyboard/mouse events?
Thanks.
I tried this today. I ran
firefox --MOZ_LOG="Widget:5"
from the terminal and watched the terminal while I had the browser up. When the freeze happened, keystrokes did register in the terminal as I attempted to change tabs or do anything else.
Yesterday when a freeze happened, I was able to save a bookmark of a page (using the minimize/maximize the window trick), but parts of the window were missing. I blindly saved what came up. I did the same today and snapped a photo. I will attach it so you can see what I mean about parts missing.
Comment 15•2 years ago
|
||
This is what came up when I attempted to save a bookmark of the page (a page I had not yet had a chance to read). I just blindly hit enter and the bookmark did save.
Comment 16•2 years ago
|
||
(In reply to Darkspirit from comment #12)
If possible, please try to find a regression range, you will get a pushlog url at the end:
$ pip3 install -U mozregression
$ ~/.local/bin/mozregression --good 100 --bad 103
If I know 101.0.1 was good and I first starting having problems in 102, would I still want to use --good 100 --bad 103? I have no idea what is causing the problem so all I can do is work as I normally would and see if it happens.
General ideas:
- Test https://nightly.mozilla.org.
- Try disabling GLX vsync: Open about:config, set layout.frame_rate=60, restart Firefox.
- Try enforcing software rendering: Open about:config, set gfx.webrender.software=true, restart Firefox.
My laptop only has onboard Intel graphics (Sandy Bridge era Celeron processor) and I have always used modesetting. I think these might be the only options which applies to my situation. I will give the last one a try and see if the freezes stop (because I'm not sure if GLX vsync applies in my situation?). I have always had the options in Settings > General > Performance unchecked. However, that must be a different setting because when I checked gfx.webrender.software it is set to false.
Reporter | ||
Comment 17•2 years ago
|
||
Hi.
In my case, freezes are very infrequent, like one once a week, so any test I do will take a long time to confirm the result.
So, thanks to Susan and the other Linux Mint users for helping with troubleshooting.
Comment 18•2 years ago
|
||
There have only been one or two days where I was able to make it through without a freeze. It's my impression (which may not be accurate) those with Nvidia seem to be running into the freezes a bit less often than those of us with Intel and AMD. However, it's also possible web activity levels might be a factor. No real way for me to be able to judge that.
Both 'Try disabling GLX vsync:' and 'Try enforcing software rendering' resulted in a freeze. (I tried the options separately as did someone on the forum with AMD graphics.) I will move on to figuring out pip and mozregression to get them installed.
Reporter | ||
Comment 19•2 years ago
|
||
I will leave my system information here in case is useful.
Comment 20•2 years ago
|
||
Because I do not know what is triggering the problem, I decided if I could run two days without a freeze then I would consider that "good" and move on the to the next version in the regression. Normally, I close Firefox each night and either shut down my computer or suspend it. To keep the test running I just disconnected from the Internet last night and did not suspend. I only had one blank tab up in Firefox.
I downloaded the first nightly build which came up. Adjusted my settings as I normally have them, imported my bookmarks (HTML file from Firefox 104) and started working yesterday afternoon. This morning it froze. I thought the first build would be an approximation of the Firefox 101.0.1 version I had been successfully using. This is what I was testing.
https://archive.mozilla.org/pub/firefox/nightly/2022/04/2022-04-04-23-18-05-mozilla-central/firefox-101.0a1.en-US.linux-x86_64.tar.bz2
Is there something about a nightly build that might be different from a final version I would normally get?
I will try again but this time I will start with --good 99 instead of 100. Please let me know if I should be trying something else.
Comment 21•2 years ago
|
||
Comment 22•2 years ago
|
||
The large raw text attachment above turned out to be a jumbled mess.
Here's a snippet, hopefully of the more relevant parts:
Features
Compositing WebRender (Software)
WebGL 1 Driver Renderer Intel Open Source Technology Center -- Mesa DRI Intel(R) G41 (ELK)
WebGL 1 Driver Version 2.1 Mesa 21.2.6
WebGL 2 Driver WSI Info -
WebGL 2 Driver Renderer WebGL creation failed:
* tryNativeGL (FEATURE_FAILURE_EGL_NO_CONFIG)
* Exhausted GL driver options. (FEATURE_FAILURE_WEBGL_EXHAUSTED_DRIVERS)
WebGL 2 Driver Version -
WebGL 2 Driver Extensions -
HW_COMPOSITING
available by default
disabled by user: Disabled by layers.acceleration.disabled=true
OPENGL_COMPOSITING
unavailable by default: Hardware compositing is disabled
WEBRENDER
available by default
disabled by env: Not qualified
unavailable-no-hw-compositing by runtime: Hardware compositing is disabled
WEBRENDER_QUALIFIED
available by default
blocklisted by env: No qualified hardware
WEBRENDER_COMPOSITOR
disabled by default: Disabled by default
blocklisted by env: Blocklisted by gfxInfo
blocked by runtime: Cannot be enabled in release or beta
WEBRENDER_PARTIAL
available by default
WEBRENDER_SHADER_CACHE
disabled by default: Disabled by default
unavailable by runtime: WebRender disabled
WEBRENDER_OPTIMIZED_SHADERS
available by default
unavailable by runtime: WebRender disabled
WEBRENDER_ANGLE
available by default
unavailable by env: OS not supported
WEBRENDER_SOFTWARE
available by default
WEBGPU
disabled by default: Disabled by default
blocked by runtime: WebGPU cannot be enabled in release or beta
X11_EGL
available by default
DMABUF
available by default
HARDWARE_VIDEO_DECODING
available by default
unavailable by runtime: Force disabled by gfxInfo
DMABUF_SURFACE_EXPORT
blocked by default: Blocklisted by gfxInfo
BACKDROP_FILTER
available by default
Failure Log
(#0) Error glxtest: VA-API test failed: no supported VAAPI profile found.
Comment 23•2 years ago
|
||
I have not tried firefox --MOZ_LOG="Widget:5" yet but I can vouch for duplicating Susan's experience where keyboard and mouse input responds albeit at a delayed snails' pace with partially updating GUI when the freeze occurs.
I did in the past try this logging:
export NSPR_LOG_MODULES=all:5
export NSPR_LOG_FILE=~/firefox/firefox.log
full details reported here: https://forums.linuxmint.com/viewtopic.php?p=2207661#p2207661
but the gist is that I the logs themselves didn't seem to indicate anything out of the ordinary (comparing non-freezing vs freezing) EXCEPT for the fact whenever the freeze occurs, the last 3 or so child processes last created as evident by the "firefox.log.child-XXX" files created are always killed.
e.g.
firefox.log.child-591
firefox.log.child-591.moz_log
firefox.log.child-590
firefox.log.child-590.moz_log
firefox.log.child-589
firefox.log.child-589.moz_log
firefox.log.child-585
firefox.log.child-588
firefox.log.child-588.moz_log
firefox.log.child-587
firefox.log.child-587.moz_log
but checking:
ps -ef | grep 'childID 591'
ps -ef | grep 'childID 590'
ps -ef | grep 'childID 589'
only shows that "childID 585" and earlier are running when this freeze happens.
childID 591, 590, and 589 despite being the most recently created child processes, seems like they have been killed or terminated
I don't know if my interpretation of this debugging facility is correct, but this is the consistent behavior I observe on my end
Reporter | ||
Comment 24•2 years ago
|
||
Another bug with the same problem https://bugzilla.mozilla.org/show_bug.cgi?id=1780972
Comment 25•2 years ago
|
||
(In reply to nuromi from comment #24)
Another bug with the same problem https://bugzilla.mozilla.org/show_bug.cgi?id=1780972
I linked comment 12 in bug 1780972 comment 14.
Difference: In comment 0 you have software rendering on XFCE, but bug 1780972 uses hardware rendering on XFCE.
Comment 26•2 years ago
|
||
(In reply to VJ from comment #21)
...no 3d compositing...
Could you check if turning on WM compositing works around the issue, given that it's one of the similarities with bug 1780972? That would be great :)
Comment 27•2 years ago
|
||
Here is my system info. I am running Cinnamon desktop with the Effects turned off, but running with its default compositing enabled. I do have the Firefox setting for "Use hardware acceleration when available." unchecked, but I've always had it that way.
I am back on Firefox 101.0.1 temporarily to get some work done without having to worry about losing work due to freezes, but expect to resume trouble-shooting by the weekend.
Comment 28•2 years ago
|
||
(In reply to Robert Mader [:rmader] from comment #26)
(In reply to VJ from comment #21)
...no 3d compositing...
Could you check if turning on WM compositing works around the issue, given that it's one of the similarities with bug 1780972? That would be great :)
Sorry I misspoke about this. So it turns out I was running a compositor, just with all the effects disabled.
As an aside, does running a compositor always imply using OpenGL/3d portions of the gpu? (or can one use 2d/X11 primitives?)
Anyways, in MATE I have these choices for Window Manager + Compositor combos:
Marco
Marco + Compositing
Marco + Compton
Metacity
Metacity + Compositing
Metacity + Compton
Compiz
NO compositing choices are: Marco and Metacity
wm-detect will tell me if I'm running a compositor or not.
Now I remember I had changed from the default "Marco" to "Marco + Compositing" some time ago, but forgot about that.
Now I realize the only difference between the two is a slight shadow around the window edges.
So I can confirm that with the "Marco + Compositing" selection, the freeze does still occur. Should I switch back to Marco without compositing?
Comment 29•2 years ago
|
||
I think it's directly related to your hardware (G41) as all the reports here uses it.
I wonder if there's any driver/mesa bug which we hit with latest Firefox version.
Comment 30•2 years ago
|
||
(In reply to VJ from comment #28)
So I can confirm that with the "Marco + Compositing" selection, the freeze does still occur. Should I switch back to Marco without compositing?
Compositing means you use transparent windows (usually used for decorations & shadows).
May it be Bug 1756903 ?
Do you see any difference if you run Firefox as:
MOZ_GTK_TITLEBAR_DECORATION=system firefox
or
MOZ_GTK_TITLEBAR_DECORATION=client firefox
or
MOZ_GTK_TITLEBAR_DECORATION=none firefox
?
Comment 31•2 years ago
|
||
(In reply to Martin Stránský [:stransky] (ni? me) from comment #29)
I think it's directly related to your hardware (G41) as all the reports here uses it.
I wonder if there's any driver/mesa bug which we hit with latest Firefox version.
Mine is Intel HD Graphics 2000 (SNB GT1) and not (G41), but in the Mint forum thread it did seem the issue was more likely to happen if one's computer was in the 10+ year old range, regardless of graphics (Intel, AMD, Nvidia).
Comment 32•2 years ago
|
||
I've been running mozregression tests since Sunday (see comment 30 on bug 1780972). The May 4 nightly ran for 3+ days without issue. Yesterday I began testing the May nightly, and it froze twice within a few hours. From my testing, it appears that this 'issue,' whatever it is, was introduced between the 5/4/ and 5/5 nightlies.
The mozregression tool now has me testing some sort of interim builds that I am not familiar with, but I will continue to test whatever builds it offers me, and will report results back here as need be.
As Susan said above, I don't think this problem has been confined to any HW combo. I am on an AMD processor with Radeon graphics.
Comment 33•2 years ago
|
||
(In reply to Martin Stránský [:stransky] (ni? me) from comment #30)
(In reply to VJ from comment #28)
So I can confirm that with the "Marco + Compositing" selection, the freeze does still occur. Should I switch back to Marco without compositing?
Compositing means you use transparent windows (usually used for decorations & shadows).
ok makes sense, given the meaning of the word
May it be Bug 1756903 ?
Do you see any difference if you run Firefox as:MOZ_GTK_TITLEBAR_DECORATION=system firefox
or
MOZ_GTK_TITLEBAR_DECORATION=client firefox
or
MOZ_GTK_TITLEBAR_DECORATION=none firefox
I don't see any difference between those 3 and with the variable unset
Comment 34•2 years ago
|
||
BTW I don't know if it's just timing and incidental usage pattern on this machine, but I recall I hit freezing more frequently in v.102.x, then seemed like it progressively decreased in v.103.x and with the update to v.104.0.x I've only hit it once (a little before my first post here) if I recall.
I did notice one big beneficial change recently is the much faster loading of saved sessions.
If there is a way to get a stack trace of the current tab (or all tabs or core dump) I can try that too the next time I encounter it
Comment 35•2 years ago
|
||
Also when running it from the command line, I get:
[GFX1-]: glxtest: VA-API test failed: no supported VAAPI profile found.
ATTENTION: default value of option mesa_glthread overridden by environment.
...
[GFX1-]: Managed to allocate after flush.
ATTENTION: default value of option mesa_glthread overridden by environment.
[GFX1-]: Managed to allocate after flush.
// on spotify:
Sandbox: attempt to open unexpected file /sys/devices/system/cpu/cpu0/cache/index2/size
Sandbox: attempt to open unexpected file /sys/devices/system/cpu/cpu0/cache/index3/size
Sandbox: attempt to open unexpected file /sys/devices/system/cpu/present
Sandbox: attempt to open unexpected file /sys/devices/system/cpu
Sandbox: unexpected multiple open of file /proc/cpuinfo
// on youtube:
[2022-09-09T17:18:43Z ERROR mp4parse] Found 2 nul bytes in "\0\0"
[2022-09-09T17:18:43Z ERROR mp4parse] Found 2 nul bytes in "\0\0"
[2022-09-09T17:18:43Z ERROR mp4parse] Found 2 nul bytes in "\0\0"
....
// don't know when/where:
[Parent 51792, Main Thread] WARNING: g_object_ref: assertion 'G_IS_OBJECT (object)' failed: 'glib warning', file /builds/worker/checkouts/gecko/toolkit/xre/nsSigHandlers.cpp:167
(firefox:51792): GLib-GObject-CRITICAL **: 10:26:51.227: g_object_ref: assertion 'G_IS_OBJECT (object)' failed
(/usr/lib/firefox/firefox-bin:57989): dconf-WARNING **: 10:26:51.397: Unable to open /var/lib/flatpak/exports/share/dconf/profile/user: Permission denied
Comment 36•2 years ago
|
||
In terms of not hitting lately on my end since v.104+ (just once before the update to 104.0.2) I also have to mention that the rest of the Mint Mate system has also been continuously updated so there's potentially other confounding variables with libraries, drivers, and minor kernel updates (5.4.x version) if the cause since the firefox v.102+ official release had some dependency on other external system factors, which seems to be the case as it apparently seems to just affect quite old systems
Comment 37•2 years ago
|
||
I spoke too soon. I finally hit it again on v.104.0.2 and only after a day of use.
In case it matters, just wanted to note everything I came across up to the point of the freeze:
[Child 9213, MediaDecoderStateMachine #5] WARNING: Decoder=7fe78d73ec00 state=DECODING_METADATA Decode metadata failed, shutting down decoder: file /builds/worker/checkouts/gecko/dom/media/MediaDecoderStateMachine.cpp:370
[Child 9213, MediaDecoderStateMachine #5] WARNING: Decoder=7fe78d73ec00 Decode error: NS_ERROR_DOM_MEDIA_METADATA_ERR (0x806e0006) - static MP4Metadata::ResultAndByteBuffer mozilla::MP4Metadata::Metadata(mozilla::ByteStream *): Cannot parse metadata: file /builds/worker/checkouts/gecko/dom/media/MediaDecoderStateMachineBase.cpp:151
[Parent 2126, Main Thread] WARNING: g_object_ref: assertion 'G_IS_OBJECT (object)' failed: 'glib warning', file /builds/worker/checkouts/gecko/toolkit/xre/nsSigHandlers.cpp:167
lots of:
ATTENTION: default value of option mesa_glthread overridden by environment.
ATTENTION: default value of option mesa_glthread overridden by environment.
ATTENTION: default value of option mesa_glthread overridden by environment.
ATTENTION: default value of option mesa_glthread overridden by environment.
also lots of memory pressure enough to unload tabs while I loaded up many youtube tabs and switched around to others. Then closed all those.
Also did "Minimine memory usage" in about:memory
Then started more agressively then. Opened a few in the background then switched between them.
During the freeze incident, I observed:
main firefox parent process used 562 MB resident memory, 12.2 GB virtual memory
- 8 "Isolated Web Co" child processes
each process using: ~102 MB to 141 MB resident memory, 2.3GB to 2.5GB virtual memory
status of all children are sleeping, with occasional wakeup for short runs
Then upon pkill firefox:
Exiting due to channel error.
Exiting due to channel error.
Exiting due to channel error.
....
Comment 38•2 years ago
|
||
I am back to testing and just figured out why I was having problems with what should have been good builds on my first go-round. I use Thunderbird and when I get forum notifications I launch from the link in the email. I did not notice doing that had opened my installed Firefox 104 version and I had been doing my work in it instead of the MozRegression build. Now that mystery is solved, hopefully I can eventually produce a pushlog url.
Comment 39•2 years ago
|
||
I encountered another freeze on v.104.0.2 again after two days. The amount of time doesn't really seem to matter; it appears more dependent on usage. I've left it running for several days without hitting it on this machine, seemingly when using it more lightly or occasionally.
I'm curious about the firefox virtual memory consumption I cited above. I encountered this large mem scenario again this latest time. I attempted to generate a core dump with gcore which seemed to have failures but still left a 13GB core file from the main parent firefox process.
Both cases where I've look at this during the freeze shows that firefox vmem exceeds my total main phys mem 4GB + 7.6GB of swap.
Comment 40•2 years ago
|
||
Mozregression testing update:
I believe I have finally emerged from the rabbit hole that is 'autoland' build testing.
When I marked the last build tested 'good' this afternoon, the tool gave me this info:
2022-09-21T16:25:35.997000: INFO : Narrowed integration regression window from [daae2d11, 805110b5] (3 builds) to [ad30f002, 805110b5] (2 builds) (~1 steps left)
2022-09-21T16:25:36.080000: DEBUG : Starting merge handling...
2022-09-21T16:25:36.084000: DEBUG : Using url: https://hg.mozilla.org/integration/autoland/json-pushes?changeset=805110b540517d2531951ea874bc9d4670eddfaf&full=1
2022-09-21T16:25:36.094000: DEBUG : redo: attempt 1/3
2022-09-21T16:25:36.096000: DEBUG : redo: retry: calling _default_get with args: ('https://hg.mozilla.org/integration/autoland/json-pushes?changeset=805110b540517d2531951ea874bc9d4670eddfaf&full=1',), kwargs: {}, attempt #1
2022-09-21T16:25:36.125000: DEBUG : urllib3.connectionpool: Resetting dropped connection: hg.mozilla.org
2022-09-21T16:25:38.882000: DEBUG : urllib3.connectionpool: https://hg.mozilla.org:443 "GET /integration/autoland/json-pushes?changeset=805110b540517d2531951ea874bc9d4670eddfaf&full=1 HTTP/1.1" 200 None
2022-09-21T16:25:38.895000: DEBUG : Found commit message:
Bug 1765399 - Don't create a new SoftwareVsyncSource instance when layout.frame_rate is changed to a different value. r=smaug
Differential Revision: https://phabricator.services.mozilla.com/D144378
2022-09-21T16:25:38.898000: DEBUG : Did not find a branch, checking all integration branches
2022-09-21T16:25:38.924000: INFO : The bisection is done.
2022-09-21T16:25:38.938000: INFO : Stopped
Here is a summary of all tests I ran in the past 2 1/2 weeks:
Mozregression testing begun Sunday, Sept 4,
with release 101 = 'good,' release 102 = 'bad'
All 'good' tests were allowed to run ~3 days before being marked as good.
All failed tests occurred within a few hours of beginning testing.
Tested build from May 16 - failed.
Tested build from May 9 - failed.
Tested build from May 6 - failed.
Tested build from May 4 - ran for 3+ days without failure. Labeled as 'good.'
Tested build from May 5 - failed.
Tested 'mozilla central build: 228073cf...', build_date: 2022-05-05 11:35:40.967000 - failed
Tested 'autoland' build: 2022-05-04 23:11:43.174000 - marked 'good'
Testing 'autoland' build b869511e: 2022-05-05 01:42:53.865000 - marked good
9/14 12:00
Testing 'autoland' build daae2d11: 2022-05-07 13:10:54.295000 - marked good
9/18 3:20p
Testing 'autoland' build 000ea190: 2022-05-07 13:00:14.643000 application_buildid: 20220505040320 - marked bad
9/18 6:30p
Testing 'autoland' build 805110b5: 2022-05-07 13:08:39.010000 application_buildid: 20220505034937 - marked bad
9/18 7:15p
Testing 'autoland' build ad30f002: 2022-05-05 05:03:51.410000 application_buildid: 20220505034020 - marked good
9/21 4:30p
Marked 'autoland' build ad30f002 good.
Let me know if have any questions, or any further tests you would like run.
Comment 41•2 years ago
|
||
I am now bisecting taskclusters on 2022-05-05. Wish I could say I see a pattern that leads to the freeze, but it still seems random.
These are the nightly builds the command line tool had me test for $ ~/.local/bin/mozregression --good 100 --bad 103
2022-04-04 - firefox 101.0a1 - good (ran 2 days with no problems)
2022-06-27 - firefox-104.0a1 - bad ( ~1.5 days before it froze)
2022-05-16 - firefox 102.0a1 - bad ( ~20 minutes before it froze)
2022-04-25 - firefox 101.0a1 - good (ran 2 days with no problems)
2022-05-06 - firefox 102.0a1 - bad ( ~21 hrs before it froze)
2022-05-01 - firefox 101.0a1 - good (ran 2 days with no problems)
2022-05-04 - firefox 102.0a1 - good (ran 2 days with no problems)
2022-05-05 - firefox 102.0a1 - bad ( ~26 hrs before it froze)
Comment 42•2 years ago
|
||
I made it to the pushlog!
29568:56.82 INFO: Pushlog:
https://hg.mozilla.org/integration/autoland/pushloghtml?fromchange=ad30f0024f7f5677c8d0ab804d16916629cf9e97&tochange=805110b540517d2531951ea874bc9d4670eddfaf
Updated•2 years ago
|
Comment 43•2 years ago
|
||
:mstange, since you are the author of the regressor, bug 1765399, could you take a look? Also, could you set the severity field?
For more information, please visit auto_nag documentation.
Comment 44•2 years ago
|
||
If it helps any, on my system I have in:
/etc/X11/xorg.conf.d/20-intel.conf
Section "Device"
Identifier "Intel Graphics"
Driver "intel"
Option "AccelMethod" "SNA"
Option "TearFree" "true"
EndSection
Comment 45•2 years ago
|
||
Is there something more I can do to help? Maybe traces I could run or logs that might be helpful?
We have people from other distros joining the Linux Mint forum to indicate they too are having the freeze problems.
Comment 46•2 years ago
|
||
Comment 44 might be a good hint - the deprecated Intel DDX driver, especially with TearFree, has been proven extremely buggy in the past. So much that we had to disable hardware acceleration for it altogether, see bug 1710400 . It would be interesting if this is the common denominator here.
Can anyone affected check if xrandr --listproviders
contains name:Intel
? If that's the case, can you check if switching to glamor solves the issue? E.g. apt remove xserver-xorg-video-intel
Comment 47•2 years ago
|
||
I have a 2nd-gen Intel which uses modesetting.
xrandr --listproviders
Providers: number : 1
Provider 0: id: 0x49 cap: 0x9, Source Output, Sink Offload crtcs: 2 outputs: 8 associated providers: 0 name:modesetting
Comment 48•2 years ago
|
||
Mine, (Intel G41 mobo chipset gpu, pre-UHD series.... like pre-"legacy")
xrandr --listproviders
Providers: number : 1
Provider 0: id: 0x46 cap: 0x9, Source Output, Sink Offload crtcs: 3 outputs: 4 associated providers: 0 name:Intel
I am a little hesitant to remove the default driver/package for a few reasons.
First, I still have no way of reliably reproducing it, especially as more recently, it can go for a while.. over a week easily without hitting it. Or I can hit it quickly, a couple times in a day.
Would it even support glamour? This is a really old motherboard gpu, prior to generation of Intel CPUs with integrated GPUs, and I don't trust it to fully implement all the OpenGL functions. And if that's not available, and using modsetting driver, then without the intel DDX, would I just be using pure software rendering without the baisc h/w 2D acceleration?
The other member had on issue with AMD and another forum member had an issue with a modern coffee lake UHD gpu which should be using glamour from a test install I did before on another machine.
With regards to bug bug 1710400, I never had this issue until exactly the official v.102 release, while bug 1710400 stated it was from v88
However, if you insist, I can still try on removing xserver-xorg-video-intel
Comment 49•2 years ago
|
||
Both RandyS in this topic and a newcomer to the Mint forum who just posted today are using AMD graphics. This issue transcends the intel driver (which most of us having the problem are NOT using).
After being notified by a website I was running an out-of-date browser (I had been using 101.0.1 since I finished tested), I decided to give 106.0.1 a try. The problem is still there. I was able to use the trick of minimize-FF/maximize-FF to get the browser screen to change and and salvage some of my work and save a bookmark before I gave up and closed Firefox. I should not have to lose work just to make Firefox functional.
Is there something I can be logging that might be helpful in determining what is happening?
Comment 50•2 years ago
|
||
After redoing the same piece of work for the third time tonight because the browser froze on me in the middle of my first two attempts, I gave up on FF106.0.1 after six days. I am back on FF101.0.1.
I reviewed the pushlog changes/bug that is causing this problem and noticed there is code to handle an issue with Wayland that was not originally expected to be a problem. I'm using X Server and wonder if maybe Wayland was not the only code affected and it hits all Linux-based distros?
There's got to be some type of corner case that I and other are hitting. I've been doing my best to track memory and cpu usage and there is no consistency with regards to those values and when a freeze happens. It also does not relate to how long the browser has been open. So frustrating. Will try to spend some more time checking the pushlog code now that I'm back on a browser version where I can manage my time better because I'm not continually having to redo work.
Comment 51•2 years ago
|
||
I am another one who is affected by this bug. I can confirm that:
- the freezes appears randomly
- after a freeze happens FF is operational, but the results of scrolling are visible after minimizing and maximizing
- after a freeze happens there is no information in the console (when FF is launched from the console), but the mechanism of displaying info from FF in the console is still working
After the update from FF ESR 93.x to FF ESR 102.x I noticed:
- frequent freezes (every 2-3 days)
- frequent crashes (every 1-2 days)
- one crash after a freeze (today, when I opened a new window so that I could test if the new window is also freezed - I managed to click on File and New Window in the menu bar)
I use a 15-years-old machine with Mageia 5 linux (32bit, 4.4.114-server-1.mga5) and nvidia 384.111 driver.
Reporter | ||
Comment 52•2 years ago
|
||
I would like that someone from Mozilla confirm that they're looking into this issue, and we are not being just completely ignored...
Comment 53•2 years ago
|
||
So with more reports coming in and comment #50, maybe the commonality is Xorg?
Anyone on modern Ubuntu default and Fedora encountering this on Wayland?
Comment 55•2 years ago
|
||
Happens also on X11 + KDE (Plasma 5.25.x).
Firefox 105 and 106.
Bug 1794563.
Comment 56•2 years ago
|
||
(In reply to nuromi from comment #52)
I would like that someone from Mozilla confirm that they're looking into this issue, and we are not being just completely ignored...
I believe the Firefox team hasn't even figured out what's causing it to freeze. Therefore, we will have to wait for a fix for a very long time, if it is fixed at all.
Comment 57•2 years ago
|
||
This one is really hard to grasp so far :/
It was already mentioned in comment 30, but can somebody else (apart from comment 33) confirm that running with MOZ_GTK_TITLEBAR_DECORATION=system
or MOZ_GTK_TITLEBAR_DECORATION=none
doesn't help with the freezes?
Comment 58•2 years ago
|
||
(In reply to randylow from comment #56)
I believe the Firefox team hasn't even figured out what's causing it to freeze. Therefore, we will have to wait for a fix for a very long time, if it is fixed at all.
Two of us have identified the code change Firefox made that is causing the problem to happen.
https://hg.mozilla.org/integration/autoland/pushloghtml?fromchange=ad30f0024f7f5677c8d0ab804d16916629cf9e97&tochange=805110b540517d2531951ea874bc9d4670eddfaf
I would like to think there is something I could be logging that would help pin point the problem in that code because it is widespread across multiple desktop environments and across multiple Linux-based distros. I am trying to find documentation myself with regards to logging, but I am starting with almost zero knowledge on the topic so my progress is very slow. :(
Comment 59•2 years ago
|
||
(In reply to Robert Mader [:rmader] from comment #57)
This one is really hard to grasp so far :/
It was already mentioned in comment 30, but can somebody else (apart from comment 33) confirm that running with
MOZ_GTK_TITLEBAR_DECORATION=system
orMOZ_GTK_TITLEBAR_DECORATION=none
doesn't help with the freezes?
I am running the Cinnamon 5.4 desktop which uses muffin(Mutter) for windows management. I have no options to change it or to change compositing.
When I go to about:config and search for MOZ_GTK_TITLEBAR_DECORATION, it indicates my current setting is Boolean. Are you wanting me to change that to String and then add those values? Or is the fact I am using Boolean a helpful clue? Or is there some place I should be looking for the default value?
Comment 60•2 years ago
|
||
(In reply to Susan from comment #59)
When I go to about:config and search for MOZ_GTK_TITLEBAR_DECORATION, it indicates my current setting is Boolean. Are you wanting me to >change that to String and then add those values? Or is the fact I am using Boolean a helpful clue? Or is there some place I should be looking for >the default value?
Susan, I don't think that value is in about:config. If you type it in the search, it looks to me like it is offering to ADD the value, not modify an existing one.
As far as the setting goes, I'm willing to try about anything. But I am with you, in that I don't know how or where to set that value. Maybe it can be added to the command line parameters? A little guidance here would help us out.
Comment 61•2 years ago
|
||
(In reply to RandyS from comment #60)
(In reply to Susan from comment #59)
When I go to about:config and search for MOZ_GTK_TITLEBAR_DECORATION, it indicates my current setting is Boolean. Are you wanting me to >change that to String and then add those values? Or is the fact I am using Boolean a helpful clue? Or is there some place I should be looking for >the default value?
Susan, I don't think that value is in about:config. If you type it in the search, it looks to me like it is offering to ADD the value, not modify an existing one.
As far as the setting goes, I'm willing to try about anything. But I am with you, in that I don't know how or where to set that value. Maybe it can be added to the command line parameters? A little guidance here would help us out.
As far as I understand, it's environment variable. You can set it on the same line preceding the firefox command, or you can set it globally with export MOZ_GTK_TITLEBAR_DECORATION=client etc
then run firefox from that shell
Comment 62•2 years ago
|
||
Sorry, never mind about MOZ_GTK_TITLEBAR_DECORATION
- as pointed out in comment 58 and before bug 1765399 is apparently to blame.
Comment 63•2 years ago
|
||
Also, just to clarify my answer in comment 33 about whether I "see a difference" question from Martin Stránský in the different MOZ_GTK_TITLEBAR_DECORATION values, I meant that I did not see a difference in the window decoration / GUI / titlebar between any values, but not regarding the freezing
In terms of the bug 1765399 change in v102, Is there anything regarding Vsync / vsyncsource specifically that we can test, in firefox such as a env variable or preference or the system?
Currently in firefox preferences, there are 5 settings with "vsync"
And system wise, for example, is there something we can try in xrandr after we encounter the freeze to check or maybe work around it?
Also, maybe others who are running Wayland and haven't enountered the bug can switch to X11 and see if they do encounter a freeze or not?
Comment 64•2 years ago
|
||
(In reply to VJ from comment #63)
Also, maybe others who are running Wayland and haven't enountered the bug can switch to X11 and see if they do encounter a freeze or not?
I've been reviewing both bugs before going back through the code again and I see someone in the sister Bug 1780972 Comment 27 mentions they can reproduce it in GNOME Wayland which would shoot down that theory (which was just a guess on my part).
I think this may be a more fundamental issue about what is created and how long it lives.
(In reply to VJ from comment #23)
but the gist is that I the logs themselves didn't seem to indicate anything out of the ordinary (comparing non-freezing vs freezing) EXCEPT for the fact whenever the freeze occurs, the last 3 or so child processes last created as evident by the "firefox.log.child-XXX" files created are always killed.
...
childID 591, 590, and 589 despite being the most recently created child processes, seems like they have been killed or terminatedbut this is the consistent behavior I observe on my end
That is what your earlier testing results I just quoted seem to indicate. Something may be being erroneously terminated.
Plus I found Bug 1789119 which sounds like the same issue. I'll plan to check that data for clues as well.
Comment 65•2 years ago
|
||
(In reply to Susan from comment #64)
Thanks a lot for pointing out the other comment that I missed about Wayland.
Also it's great to see someone else also see the same behavior about the most recent child processes being killed (or crashing?)
What I'd like to know is how to find out what those other processes are. The process names given are somewhat non-descript i.e. "Isolated Web Co", "Web Content", "WebExtensions" "Priveleged Cont"
Comment 66•2 years ago
|
||
I noticed that not only the minimize/maximize trick leads to show the proper content of the window.
When more than one tab was open, I could click on the inactive tab and at this moment the title bar of FF changed appropriately but the window content did not change (however, it was not every time - I noticed a normal behaviour twice per approximately 100 tries). But when you click on the tab once again after a few seconds, the chance of appearance increases up to 30-40 %. Sometimes three or four clicks were needed to bring the tab into view.
Another observation - when I used a down or up arrow once and did the min/max trick once - the content either didn't move or moved slightly, and (surprisingly) when I did the min/max trick once again the content moved slightly once again.
It looks like that when the freeze occurs there is a problem with handling mouse and keyboard events.
Hypothesis #1: when the freeze occurs the event queue reports to be empty when it is not empty.
Hypothesis #2: when the freeze occurs the event queue returns incorrectly its first element (returns its first element from the time before last operation of removing its first element was done).
Comment 68•2 years ago
|
||
As it's related to refresh driver, can you run on terminal with
MOZ_LOG="nsRefreshDriver:5"
and attach last ~200 lines when you see the freeze? I wonder if the refresh driver is blocked.
Thanks.
Comment 69•2 years ago
|
||
I ran
firefox --MOZ_LOG="nsRefreshDriver:5"
This command spews out A LOT of data so to try and make sure I would be getting the results specifically for when the issue happened, this is what I did.
When I clicked and no action happened. I then clicked a couple of other tabs to verify it was "frozen" and then just immediately closed Firefox. I did not try the minimize maximize trick. I am running Firefox 107.0.
I then highlighted the bottom line in my terminal and kept scrolling upward until it seemed like things were changing. I ended up grabbing 538 lines. I saved those so if you need to go back more than the 200 lines I included in the file I attached, let me know.
Comment 70•2 years ago
|
||
Thanks. From the log it's not related to refresh driver - transactions seems to be processed correctly and there isn't any block.
Please run with MOZ_LOG="Widget:5" and if you notice the freeze, check if new lines to the log are added - i.e. if mouse/button clicks are logged.
Please make sure you're running Mozilla binaries and when you notice the freeze, try to get backtrace from the Firefox:
https://fedoraproject.org/wiki/Debugging_guidelines_for_Mozilla_products#Getting_Mozilla_crash_report_from_running_application
Comment 71•2 years ago
|
||
(In reply to Susan from comment #69)
Created attachment 9304241 [details]
Susan-last 200 lines of MOZ_LOG="nsRefreshDriver:5"I ran
firefox --MOZ_LOG="nsRefreshDriver:5"
This command spews out A LOT of data so to try and make sure I would be getting the results specifically for when the issue happened, this is what I did.
Susan, you might able to use the 'MOZ_LOG_FILE' parameter to save output to a file, for example: MOZ_LOG_FILE=/tmp/log.txt
That idea comes from this page, in the 'Linux' section - you can use the export cmds too, instead of command line arguments. I think...
https://firefox-source-docs.mozilla.org/networking/http/logging.html
Comment 72•2 years ago
|
||
(In reply to Martin Stránský [:stransky] (ni? me) from comment #70)
Please run with MOZ_LOG="Widget:5" and if you notice the freeze, check if new lines to the log are added - i.e. if mouse/button clicks are logged.
Unless something has changed in the code since I tried this earlier (FF103? I can't recall for sure), then, yes, the mouse/button clicks are logged as I listed in comment #14.
Please make sure you're running Mozilla binaries and when you notice the freeze, try to get backtrace from the Firefox:
https://fedoraproject.org/wiki/Debugging_guidelines_for_Mozilla_products#Getting_Mozilla_crash_report_from_running_application
It's my understanding Linux Mint just repackages the Mozilla binaries so they work with our package management system, so I believe I am running Mozilla binaries. Will follow up to try and get a backtrace on the next freeze.
Comment 73•2 years ago
|
||
(In reply to Martin Stránský [:stransky] (ni? me) from comment #70)
Please make sure you're running Mozilla binaries and when you notice the freeze, try to get backtrace from the Firefox:
https://fedoraproject.org/wiki/Debugging_guidelines_for_Mozilla_products#Getting_Mozilla_crash_report_from_running_application
I was getting error messages in the terminal when I tried to kill the process, but after several different attempts (which all produced error messages?) I finally got the pop-up saying Mozilla had crashed. I told it not to restart to make sure all was stopped. I then picked up the following info when I restarted.
This was the info where I clicked the Submit button
Report ID - 30d48564-1013-484a-5db0-c44b928b8679 11/20/22, 11:42 AM
Then this came up with a View button
Report ID - bp-d51f84c9-a97d-48fa-be98-c063d0221120 11/20/22, 11:45 AM
I have not yet reviewed any of it.
Comment 74•2 years ago
|
||
Comment 75•2 years ago
|
||
(In reply to Martin Stránský [:stransky] (ni? me) from comment #70)
Thanks. From the log it's not related to refresh driver - transactions seems to be processed correctly and there isn't any block.
Please run with MOZ_LOG="Widget:5" and if you notice the freeze, check if new lines to the log are added - i.e. if mouse/button clicks are logged.
Please make sure you're running Mozilla binaries and when you notice the freeze, try to get backtrace from the Firefox:
https://fedoraproject.org/wiki/Debugging_guidelines_for_Mozilla_products#Getting_Mozilla_crash_report_from_running_application
I attached a file with ~200 lines form terminal. Since the terminal and firefox were on different monitors so I could easily recognize the moment of freeze (it is indicated in the attached file).
Some additional information (from the last few day):
- Once (so far), a freeze happened a few seconds after firefox was launched, so freezes are not connected with any specific html content.
- There are also frequent crashes of firefox, they started the same time the freezes started (after upgrade from FF ESR 91 to 102). Is it possible that the crashes are connected with the freezes? If you think it may help, I can attach messages printed in the terminal at the moment of crashes.
Comment 76•2 years ago
|
||
I found a problem in VsyncDispatcher::UpdateVsyncStatus. It is called from multiple threads but it doesn't protect the order of calls to VsyncSource::AddVsyncDispatcher and VsyncSource::RemoveVsyncDispatcher. After the state is updated but before the calls, one thread could get interrupted while another thread runs, so the state is correct but the calls end up in the wrong order. If they get out of order then in practice it will not fix itself because almost all Vsync updates stop. I had this occur with normal timing after using it for a while with my usual browser activities. Then I added a delay before the Remove call and I can duplicate the problem just by shaking the mouse pointer and pushing the scroll wheel up and down for a couple seconds. I didn't see any logging built into these functions so I added my own printfs.
Assignee | ||
Comment 77•2 years ago
|
||
This is phenomenal work, thank you!!
(In reply to Jeff DeFouw from comment #76)
If they get out of order then in practice it will not fix itself because almost all Vsync updates stop.
Hmm, I don't understand this part - shouldn't the AddVsyncDispatcher
call cause vsync to be re-enabled?
Assignee | ||
Updated•2 years ago
|
Comment 78•2 years ago
|
||
(In reply to Markus Stange [:mstange] from comment #77)
(In reply to Jeff DeFouw from comment #76)
If they get out of order then in practice it will not fix itself because almost all Vsync updates stop.
Hmm, I don't understand this part - shouldn't theAddVsyncDispatcher
call cause vsync to be re-enabled?
Since the calls were done out of order outside of the recorded state (mState) in the Dispatcher, the last call was RemoveVsyncDispatcher even though mObservers is not empty and the Dispatcher's mIsObservingVsync is true. AddVsyncDispatcher will not be called again as long as mObservers is not empty, and that appears to be the case for as long as Firefox keeps running normally. Based on how all the Observer Add/Removes also stop in my logs I assume the observers (at least one of them) are waiting for more Vsync events and they will keep themselves in the Observers until that happens. If mObservers becomes empty then an extra call to RemoveVsyncDispatcher will be made to clear mIsObservingVsync and the next AddVsyncObserver will call AddVsyncDispatcher and that will enable Vsync again. In my logs this actually happens while Firefox is closing.
Comment 79•2 years ago
|
||
From widget code perspective there isn't anything wrong with the provided logs/backtraces - WebRender rendering doesn't look blocked and we're getting all events from system.
Comment 80•2 years ago
|
||
(In reply to Jeff DeFouw from comment #76)
Created attachment 9304353 [details]
Log of Vsync activity showing out-of-order calls with extra delayI found a problem in VsyncDispatcher::UpdateVsyncStatus. It is called from multiple threads but it doesn't protect the order of calls to VsyncSource::AddVsyncDispatcher and VsyncSource::RemoveVsyncDispatcher. After the state is updated but before the calls, one thread could get interrupted while another thread runs, so the state is correct but the calls end up in the wrong order. If they get out of order then in practice it will not fix itself because almost all Vsync updates stop.
Thank you for your work. I understand your explanation from the standpoint of how everything would freeze. But are those of us who are minimizing and then maximizing the browser to get something to happen on screen just getting lucky that we are picking a thread which has its calls close enough they are not out of order than thus something happens?
I usually have many tabs open when the freeze occurs. In order to minimize lost work, I have found if I click to scroll down the page and then minimize and then maximize Firefox, usually the page has scrolled down (as per my click) so I can see the next part of the page. I repeat that process if it is a long page. I have, at times, also been able to successfully bookmark the page so I can return to it. And change tabs to read what I had not yet seen. So sometimes we can get something to happen, but it is usually just one event at a time per min/max of the Firefox window. (It doesn't always work, but that may be a factor of at what point things got out of order?)
Then again, maybe those actions which complete relate to the point you made here and the min/max is a Vsync event?
(In reply to Jeff DeFouw from comment #78)
Based on how all the Observer Add/Removes also stop in my logs I assume the observers (at least one of them) are waiting for more Vsync events and they will keep themselves in the Observers until that happens. If mObservers becomes empty then an extra call to RemoveVsyncDispatcher will be made to clear mIsObservingVsync and the next AddVsyncObserver will call AddVsyncDispatcher and that will enable Vsync again.
Comment 81•2 years ago
|
||
(In reply to Susan from comment #80)
Thank you for your work. I understand your explanation from the standpoint of how everything would freeze. But are those of us who are minimizing and then maximizing the browser to get something to happen on screen just getting lucky that we are picking a thread which has its calls close enough they are not out of order than thus something happens?
There are many actions you can take that will make a call to WebRenderBridgeParent::ScheduleForcedGenerateFrame and the sequence that follows can decide to force some updating to happen without a Vsync notification and outside of the normal Vsync calls.
Comment 82•2 years ago
|
||
(In reply to Jeff DeFouw from comment #76)
Created attachment 9304353 [details]
Log of Vsync activity showing out-of-order calls with extra delayI found a problem in VsyncDispatcher::UpdateVsyncStatus. It is called from multiple threads but it doesn't protect the order of calls to VsyncSource::AddVsyncDispatcher and VsyncSource::RemoveVsyncDispatcher.
Jeff,
I, too, very much appreciate your efforts.
When you speak of 'multiple threads,' does this relate to the various 'dom.ipc.processCount.*' settings in about:config? I've wondered before, if there is some sort of non-thread-safe issue, if setting those values that are defaulted to 4 and 8 back to 1, and trying to make FF run 'single-threaded,' might make some sort of difference. The behavior you saw just made me all the more curious.
Just for kicks, I changed all those settings to '1' this morning on a newer FF version that has frozen for me before (1.06.05), and am trying it to see what happens. But I honestly don't know if those settings are related to those 'multiple threads' you speak of - though I think they may be... Since you're somewhat familiar with the code, I though you might know the answer.
I'll get back to this discussion on the results of my 'single-thread' testing. At this point, I figure, what the heck? I'm trying to find something that will keep the browser running...
Comment 83•2 years ago
|
||
(In reply to RandyS from comment #82)
I'll get back to this discussion on the results of my 'single-thread' testing. At this point, I figure, what the heck? I'm trying to find something that will keep the browser running...
Well, that didn't take as long as I had hoped. I was using it all morning, but it just froze. So much for that idea.
Assignee | ||
Comment 85•2 years ago
|
||
(In reply to Jeff DeFouw from comment #78)
(In reply to Markus Stange [:mstange] from comment #77)
(In reply to Jeff DeFouw from comment #76)
If they get out of order then in practice it will not fix itself because almost all Vsync updates stop.
Hmm, I don't understand this part - shouldn't theAddVsyncDispatcher
call cause vsync to be re-enabled?Since the calls were done out of order outside of the recorded state (mState) in the Dispatcher, the last call was RemoveVsyncDispatcher
Ah of course, I got it now, after looking at your log more closely. The VsyncDispatcher wants to send out "Add myself, Remove myself, Add myself", but the VsyncSource ends up seeing "Add, Add, Remove", so the second Add doesn't end up having any effect, and the dispatcher remains removed. Furthermore, the VsyncDispatcher's mIsObservingVsync
state is now wrong - the dispatcher thinks it's still registered at the source, but it's not.
I think there are two options to fix this: We could put the AddVsyncDispatcher/RemoveVsyncDispatcher call inside of the lock, or we can make the VsyncSource keep a "reference count" per dispatcher, so that Add,Add,Remove still ends up with a count of 1 and keeps the dispatcher registered.
I'm going to try the locking solution first. But I'll need to change some VsyncSource implementations so that they no longer call NotifyVsync from inside EnableVsync.
Assignee | ||
Comment 86•2 years ago
|
||
With a random-duration sleep() in the right place I was able to reproduce this relatively easily on macOS. So this is really a cross-platform issue which just depends on (un)lucky thread scheduling.
Here's a profile captured of the freeze, on macOS, with a few extra markers: https://share.firefox.dev/3TT93Sk
Assignee | ||
Comment 87•2 years ago
|
||
(In reply to Markus Stange [:mstange] from comment #85)
I'm going to try the locking solution first. But I'll need to change some VsyncSource implementations so that they no longer call NotifyVsync from inside EnableVsync.
Hmm, WaylandVsyncSource
can definitely call NotifyVsync
inside EnableVsync
. I'm a bit scared to touch it in a patch that I want to uplift to release, so I think I'll go with the "refcount" solution instead.
Assignee | ||
Comment 88•2 years ago
|
||
This fixes a bug which caused Firefox windows to become frozen after some time.
Full credit goes to Susan and RandyS for bisecting the regressor of this bug, and
to Jeff DeFouw for debugging the issue and finding the cause.
The bug here is a "state race" between the VsyncDispatcher state and
the VsyncSource state. Both are protected by locks, and the code that
runs in those locks respectively can see a different orders of invocations.
VsyncDispatcher::UpdateVsyncStatus does this thing where it updates its state inside
a lock, gathers some information, and then calls methods on VsyncSource outside the lock.
Since it calls those methods outside the lock, these calls can end up being executed
in a different order than the state changes were observed inside the lock.
Here's the bad scenario in detail, with the same VsyncDispatcher being used from
two different threads, turning a Remove,Add into an Add,Remove:
Thread A Thread B
VsyncDispatcher::UpdateVsync
|
|----> Enter VsyncDispatcher lock
| | VsyncDispatcher::UpdateVsync
| | state->mIsObservingVsync = false |
| | (We want to stop listening) |
| | |
|<---- Exit VsyncDispatcher lock |
| |----> Enter VsyncDispatcher lock
| | |
| | | state->mIsObservingVsync = true
| | | (We want to start listening)
| | |
| |<---- Exit VsyncDispatcher lock
| |
| |----> Enter VsyncSource::AddVsyncDispatcher
| | |
| | |----> Enter VsyncSource lock
| | | |
| | | | state->mDispatchers.Contains(aVsyncDispatcher)
|----> VsyncSource::RemoveVsyncDispatcher | | | VsyncDispatcher already present in list, not doing anything
| | | | |
| | | |<---- Exit VsyncSource lock
| | | |
| | |<---- Exit VsyncSource::AddVsyncDispatcher
| |----> Enter VsyncSource lock
| | |
| | | Removing aVsyncDispatcher from state->mDispatchers
| | |
| |<---- Exit VsyncSource lock
| |
|<---- Exit VsyncSource::AddVsyncDispatcher
Now the VsyncDispatcher thinks it is still observing vsync, but it is
no longer registered with the VsyncSource.
This patch makes it so that two calls to AddVsyncDispatcher followed by one call
to RemoveVsyncDispatcher result in the VsyncDispatcher still being registered.
AddVsyncDispatcher is no longer idempotent.
Comment 89•2 years ago
|
||
(In reply to Markus Stange [:mstange] from comment #85)
(In reply to Jeff DeFouw from comment #78)
I think there are two options to fix this: We could put the AddVsyncDispatcher/RemoveVsyncDispatcher call inside of the lock, or we can make the VsyncSource keep a "reference count" per dispatcher, so that Add,Add,Remove still ends up with a count of 1 and keeps the dispatcher registered.
I'm going to try the locking solution first. But I'll need to change some VsyncSource implementations so that they no longer call NotifyVsync from inside EnableVsync.
I noticed there's a RecursiveMutex in xpcom while looking around. Protecting the entire UpdateVsyncStatus call with its own RecursiveMutex and RecursiveMutexAutoLock was the first idea I had as a safe and somewhat simple solution but I'm completely new to the code. It did fix my test case.
Comment 90•2 years ago
|
||
(In reply to Markus Stange [:mstange] from comment #86)
With a random-duration sleep() in the right place I was able to reproduce this relatively easily on macOS. So this is really a cross-platform issue which just depends on (un)lucky thread scheduling.
The Reddit thread mentioned in ( Bug 1789119 ) (which was merged into this bug as a duplicate) has several people running Windows saying they ran into the same issue so I'm not surprised by your results. Plus, it didn't seem like there was any specific OS distinction in the code which is one reason I found it puzzling it seemed only Linux-based distros were really finding it to be a problem.
And even more odd that only some of us were experiencing it and not everyone running Linux-based distros. We theorized in the Linux Mint thread that maybe it had something to do with the age of the computer, but it seems to me if that were the case then I would have been able to more consistently hit the issue. It truly seems erratic as to when it happens. It's more likely to happen in my case if Firefox has been up and running for ~24 hours, but I've had some happen with 20 minute of restarting. I think it was ~6 hours after restarting this past Sunday. So very random.
Thanks for your work patching this.
Assignee | ||
Comment 91•2 years ago
|
||
(In reply to Jeff DeFouw from comment #89)
I noticed there's a RecursiveMutex in xpcom while looking around. Protecting the entire UpdateVsyncStatus call with its own RecursiveMutex and RecursiveMutexAutoLock was the first idea I had as a safe and somewhat simple solution
Good point, this would have been an option, too. I usually try to avoid re-entrant locks but I don't really remember why.
Patch seems to be green on try: https://treeherder.mozilla.org/jobs?repo=try&revision=895937c9f623389cb16c1634dd6a48d171149bbf
Comment 92•2 years ago
|
||
Assignee | ||
Comment 93•2 years ago
|
||
[Tracking Requested - why for this release]: This regression was introduced in 102. It happens rarely, but if it happens, it freezes the entire browser. This bug was initially thought to only affect a few Linux configurations, but it turns out to affect all platforms.
Updated•2 years ago
|
Comment 95•2 years ago
|
||
bugherder |
Comment 96•2 years ago
|
||
The patch landed in nightly and beta is affected.
:mstange, is this bug important enough to require an uplift?
- If yes, please nominate the patch for beta approval.
- If no, please set
status-firefox108
towontfix
.
For more information, please visit auto_nag documentation.
Comment 97•2 years ago
|
||
Not tracking for 107, but setting 107 to fix-optional.
After some bake time in nightly/beta, this could be considered as a dot release ride-along if nominated for release uplift
Assignee | ||
Comment 98•2 years ago
|
||
The fix is in Firefox Nightly now. To everyone who was seeing this bug somewhat regularly, can you test Nightly and see if the bug is indeed fixed?
I will request uplift now but we may want to wait for some confirmation that the fix worked before uplifting.
Assignee | ||
Comment 99•2 years ago
|
||
Comment on attachment 9304780 [details]
Bug 1781167 - Allow stacking calls to Add/RemoveVsyncDispatcher so that we survive the sequence Add,Add,Remove. r=jrmuizel
Beta/Release Uplift Approval Request
- User impact if declined: Frozen browser after extended usage (e.g. 1 day) for some users
- Is this code covered by automated tests?: Yes
- Has the fix been verified in Nightly?: No
- Needs manual test from QE?: No
- If yes, steps to reproduce: (hard and time-intensive to reproduce)
- List of other uplifts needed: None
- Risk to taking this patch: Low
- Why is the change risky/not risky? (and alternatives if risky): Tightly-scoped fix.
- String changes made/needed:
- Is Android affected?: Yes
ESR Uplift Approval Request
- If this is not a sec:{high,crit} bug, please state case for ESR consideration: Dataloss: If this issue occurs, users have to restart the browser
- User impact if declined: Frozen browser after extended usage (e.g. 1 day) for some users
- Fix Landed on Version: 109
- Risk to taking this patch: Low
- Why is the change risky/not risky? (and alternatives if risky): Tightly-scoped fix.
Updated•2 years ago
|
Assignee | ||
Comment 103•2 years ago
|
||
There's a report that the patch didn't fix the bug: bug 1802229
Comment 105•2 years ago
|
||
Comment on attachment 9304780 [details]
Bug 1781167 - Allow stacking calls to Add/RemoveVsyncDispatcher so that we survive the sequence Add,Add,Remove. r=jrmuizel
Rejecting release uplift per Comment 103, while the investigation continues.
Comments in Bug 1802229 should be investigated if this should still be considered for 108?
Updated•2 years ago
|
Comment 106•2 years ago
|
||
Comment on attachment 9304780 [details]
Bug 1781167 - Allow stacking calls to Add/RemoveVsyncDispatcher so that we survive the sequence Add,Add,Remove. r=jrmuizel
Approved for 108.0b8
This seems to have corrected the issue for some users in Bug 1802229
Comment 107•2 years ago
|
||
bugherder uplift |
Comment 108•2 years ago
|
||
Comment on attachment 9304780 [details]
Bug 1781167 - Allow stacking calls to Add/RemoveVsyncDispatcher so that we survive the sequence Add,Add,Remove. r=jrmuizel
Approved for 102.6esr.
Comment 109•2 years ago
|
||
bugherder uplift |
Updated•2 years ago
|
Updated•2 years ago
|
Description
•