Closed Bug 1739884 Opened 3 years ago Closed 1 year ago

Slow cold start: mozilla::widget::GfxInfo::GetData takes 3s

Categories

(Core :: Graphics, defect)

Firefox 94
x86_64
Linux
defect

Tracking

()

RESOLVED DUPLICATE of bug 1787182

People

(Reporter: gordian.dziwis, Unassigned)

References

Details

Attachments

(1 file)

User Agent: Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:94.0) Gecko/20100101 Firefox/94.0

Steps to reproduce:

Cold start Firefox.
I profiled the startup: https://profiler.firefox.com/public/0ga0meyv4bfzhngy42kzc8zm2sa7mx37qyfway8/calltree/?globalTrackOrder=0w3&localTrackOrderByPid=118725-201~119585-0~119613-0~119628-0&thread=0&timelineType=cpu-category&v=6
mozilla::widget::GfxInfo::GetData takes 3s

Actual results:

Firefox takes several seconds

Expected results:

Should be faster on a high end laptop

The Bugbug bot thinks this bug should belong to the 'Core::Widget' component, and is moving the bug to that component. Please revert this change in case you think the bot is wrong.

Component: Untriaged → Widget
Product: Firefox → Core

The severity field is not set for this bug.
:jimm, could you have a look please?

For more information, please visit auto_nag documentation.

Flags: needinfo?(jmathies)

Any chance perf devs can help investigate what's going on here?

Component: Widget → Graphics
Flags: needinfo?(jmathies) → needinfo?(bas)
Severity: -- → S3

I don't think I can see the thread we're waiting on in that profile.

Flags: needinfo?(bas)

Right now latest firefox does not start at all. Cold start hangs with indefinite wait on mozilla::widget::GfxInfo::GetData()
AMD GPU, latest drivers from rpmfusion.
Problem is semi reproducible, may or may not go away after several attempts to start firefox and after starting other browsers.
Hard workaround is to start firefox like
ssh -Y localhost firefox
which, apparently, causes glx query to fail and prevent firefox from not starting:

Crash Annotation GraphicsCriticalError: |[0][GFX1-]: glxtest: X error, error_code=1, request_code=156, minor_code=1 (t=0.160321) [GFX1-]: glxtest: X error, error_code=1, request_code=156, minor_code=1
Crash Annotation GraphicsCriticalError: |[0][GFX1-]: glxtest: X error, error_code=1, request_code=156, minor_code=1 (t=0.160321) |[1][GFX1-]: glxtest: process failed (exited with status 1) (t=0.160405) [GFX1-]: glxtest: process failed (exited with status 1)

(In reply to Eugene Kanter from comment #5)

Right now latest firefox does not start at all. Cold start hangs with indefinite wait on mozilla::widget::GfxInfo::GetData()
AMD GPU, latest drivers from rpmfusion.
Problem is semi reproducible, may or may not go away after several attempts to start firefox and after starting other browsers.
Hard workaround is to start firefox like
ssh -Y localhost firefox
which, apparently, causes glx query to fail and prevent firefox from not starting:

Crash Annotation GraphicsCriticalError: |[0][GFX1-]: glxtest: X error, error_code=1, request_code=156, minor_code=1 (t=0.160321) [GFX1-]: glxtest: X error, error_code=1, request_code=156, minor_code=1
Crash Annotation GraphicsCriticalError: |[0][GFX1-]: glxtest: X error, error_code=1, request_code=156, minor_code=1 (t=0.160321) |[1][GFX1-]: glxtest: process failed (exited with status 1) (t=0.160405) [GFX1-]: glxtest: process failed (exited with status 1)

Can you please attach gdb to the Firefox process (perhaps a glxtest process) and check where it's frozen?
https://fedoraproject.org/wiki/Debugging_guidelines_for_Mozilla_products#Application_freeze

Thanks.

Flags: needinfo?(gordian.dziwis)
Attached file firefox-trace.txt (deleted) —

(In reply to Martin Stránský [:stransky] (ni? me) from comment #6)

Can you please attach gdb to the Firefox process (perhaps a glxtest process) and check where it's frozen?

Call stack uploaded.

The same result using official Mozilla binary. Is it possible to rebuild without offending method?

Redirect a needinfo that is pending on an inactive user to the triage owner.
:bhood, since the bug has recent activity, could you have a look please?

For more information, please visit auto_nag documentation.

Flags: needinfo?(gordian.dziwis) → needinfo?(bhood)
Flags: needinfo?(bhood)
OS: Unspecified → Linux

Martin, is this at all actionable?

Flags: needinfo?(stransky)

Yes. It's caused by hang in glxtest process. We added 4s timeout to it in Bug 1813500. Bug 1787182 delays glxtest test to run it on first Firefox start and skip it if we redirect request to remote Firefox instance.

I'm afraid the glxtest hang itself can't be fixed because it's caused by broken drivers.

Flags: needinfo?(stransky)

We may also rename the glxtest to make it explicit in process view (right now it's named 'firefox' in process list as well as main application).

Blocks: gfx-triage
Hardware: Unspecified → x86_64

(In reply to Martin Stránský [:stransky] (ni? me) from comment #12)

Yes. It's caused by hang in glxtest process. We added 4s timeout to it in Bug 1813500. Bug 1787182 delays glxtest test to run it on first Firefox start and skip it if we redirect request to remote Firefox instance.

I'm afraid the glxtest hang itself can't be fixed because it's caused by broken drivers.

Martin, where is it hanging in the driver?

Flags: needinfo?(stransky)

(In reply to Jeff Muizelaar [:jrmuizel] from comment #14)

(In reply to Martin Stránský [:stransky] (ni? me) from comment #12)

Yes. It's caused by hang in glxtest process. We added 4s timeout to it in Bug 1813500. Bug 1787182 delays glxtest test to run it on first Firefox start and skip it if we redirect request to remote Firefox instance.

I'm afraid the glxtest hang itself can't be fixed because it's caused by broken drivers.

Martin, where is it hanging in the driver?

See Bug 1813500 for instance, that hangs in va-api detection.

To get better backtraces/hangs we'd need to:

  • name glxtest process correctly (now it's named "firefox" as well as main application)
  • split glxtest and vaapi test so we don't disable webrender in case of va-api crashes/freezes
Flags: needinfo?(stransky)
Flags: needinfo?(stransky)
Flags: needinfo?(stransky)
No longer blocks: gfx-triage

Any progress on resolving root cause here? (In reply to Martin Stránský [:stransky] (ni? me) from comment #12)

Yes. It's caused by hang in glxtest process. We added 4s timeout to it in Bug 1813500. Bug 1787182 delays glxtest test to run it on first Firefox start and skip it if we redirect request to remote Firefox instance.

I'm afraid the glxtest hang itself can't be fixed because it's caused by broken drivers.

Could anyone comment "broken drivers" statement?
References:
https://bugzilla.redhat.com/show_bug.cgi?id=2089380
https://gitlab.freedesktop.org/drm/amd/-/issues/2252

Bug 1787182 should fix most of the issues. We don't test/enable VA-API on known broken drivers which is r600 AMD for instance.

Please try to reproduce the hang/crash on latest nightly & clean profile and file a new bug for it.
https://fedoraproject.org/wiki/How_to_debug_Firefox_problems#Testing_Mozilla_binaries

Thanks.

Status: UNCONFIRMED → RESOLVED
Closed: 1 year ago
Duplicate of bug: 1787182
Resolution: --- → DUPLICATE
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: