Closed Bug 1645776 Opened 4 years ago Closed 4 years ago

Crash in [@ g_log_writer_default]

Categories

(Core :: Widget: Gtk, defect)

79 Branch
defect

Tracking

()

RESOLVED FIXED
mozilla80
Tracking Status
firefox-esr68 --- unaffected
firefox-esr78 --- unaffected
firefox77 --- unaffected
firefox78 --- unaffected
firefox79 --- wontfix
firefox80 --- fixed

People

(Reporter: calixte, Assigned: wekerbugs)

References

(Blocks 1 open bug, Regression)

Details

(Keywords: crash, regression)

Crash Data

Attachments

(3 files)

This bug is for crash report bp-d609cc2b-da42-4591-a556-c371b0200615.

Top 10 frames of crashing thread:

0 libglib-2.0.so.0 g_log_writer_default 
1 libglib-2.0.so.0 g_log_structured_array 
2 libglib-2.0.so.0 g_log_structured_standard 
3 libgdk-3.so.0 gdk_x11_grab_server 
4 libX11.so.6 _XError 
5 libX11.so.6 _XFreeX11XCBStructure 
6 libX11.so.6 _XReply 
7 libX11.so.6 XGetWindowProperty 
8 libxul.so nsWindow::GetWorkspaceID widget/gtk/nsWindow.cpp:1908
9 libxul.so nsGlobalWindowInner::GetWorkspaceID dom/base/nsGlobalWindowInner.cpp:6615

There are 2 crashes (from 1 installation) in nightly 79 with buildid 20200614092422. In analyzing the backtrace, the regression may have been introduced by patch [1] to fix bug 1628749.

[1] https://hg.mozilla.org/mozilla-central/rev?node=70c8e33b5710

Flags: needinfo?(wekerbugs)
Crash Signature: [@ g_log_writer_default] → [@ g_log_writer_default] [@ _g_log_abort | g_log_writer_default]

While I cannot reproduce the error exactly, I can induce this failstate by calling XGetWindowProperty with an intentional wrong atom or window.

I strongly suspect, that this fails on non EWHM compliant window managers, as there are no checks if the atom returned by XInternAtom is valid. This would explain, why this crash is not reproduceable and relativly rare as one needs an EWHM non compliant WM. After a bit of googling I did not find any such WM.

To be safe there should be error handling added for invalid atoms returned by XInternAtom.

Martin Stránský can you kindly review my patch again?

Flags: needinfo?(wekerbugs) → needinfo?(stransky)

This could also be unrelated as X erors are submitted asynchronously.

I see some crashreports [1] that are eerily similar but in a version before the patch [2].

[1] https://hg.mozilla.org/mozilla-central/rev?node=70c8e33b5710
[2] https://crash-stats.mozilla.org/report/index/10224663-3992-49cc-9c3c-001de0200608

Component: Session Restore → Widget: Gtk
Product: Firefox → Core
Assignee: nobody → wekerbugs
Status: NEW → ASSIGNED
Flags: needinfo?(stransky)
Pushed by ncsoregi@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/00b1e2a78573 Check if XInternAtom returned an invalid atom r=stransky
Status: ASSIGNED → RESOLVED
Closed: 4 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla79

The issue can still be observed with recent builds, see e.g. this report

wekerbugs, can you take another look, please?

Status: RESOLVED → REOPENED
Flags: needinfo?(wekerbugs)
Resolution: FIXED → ---
Target Milestone: mozilla79 → ---

I think we also need check wmWindow != nullptr

Flags: needinfo?(wekerbugs)
Pushed by rmaries@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/b3d9f2fe4edd [Linux/Gtk] Check XWindow id for XGetWindowProperty(), r=jhorak

(In reply to Martin Stránský [:stransky] from comment #7)

I think we also need check wmWindow != nullptr

That should only happen, when the wm is not ewmh compliant. Hmmn, I think something else is going on. Normally there is a log produced in the console when firefox crashes in this way but I cannot reproduce the crash.

Hopefully the wmWindow check fixes everything but I would not be suprised if it did not...

(In reply to wekerbugs from comment #10)

(In reply to Martin Stránský [:stransky] from comment #7)

I think we also need check wmWindow != nullptr

That should only happen, when the wm is not ewmh compliant. Hmmn, I think something else is going on. Normally there is a log produced in the console when firefox crashes in this way but I cannot reproduce the crash.

Hopefully the wmWindow check fixes everything but I would not be suprised if it did not...

Forgot to say: That should only happen, when the wm is not ewmh compliant but pretends to be.

Status: REOPENED → RESOLVED
Closed: 4 years ago4 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla79

We're still seeing this in 79 beta and 80 nightly, e.g. bp-d816d43c-ad1f-4d9e-8d95-141b50200702.

Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Target Milestone: mozilla79 → ---

looking at _g_log_abort | g_log_writer_default crash stat signatures there's a some general X error which leads to _gdk_x11_display_error_event() and it's triggered by various X routines. Given the asynchronous nature of X11 protocol we can't say XGetWindowProperty() really causes that.

This crash signature also appeared before but less often.
Now with every GetWorkspaceID() we invoke XGetWindowProperty() and can catch the error.

I really don't know how this can still crash otherwise.
Maybe the when the WM exits/crashes before Firefox?

One optimization that could reduce the frequency of crashes with XGetWindowProperty() is to only check DesktopUsesDynamicWorkspaces() once and save the result. That would fix it if it is something like the WM exiting before Firefox but idk if it is worth it.

(In reply to Razvan Maries from comment #12)

https://hg.mozilla.org/mozilla-central/rev/b3d9f2fe4edd

Looks like this regressed the number of crashes. Should we back this out and look for a different fix?

Should we backout the patches that landed? It seems the crash rate has gotten worse since they landed.

Flags: needinfo?(wekerbugs)

(In reply to Jim Mathies [:jimm] from comment #17)

(In reply to Razvan Maries from comment #12)

https://hg.mozilla.org/mozilla-central/rev/b3d9f2fe4edd

Looks like this regressed the number of crashes. Should we back this out and look for a different fix?

The linked changeset cannot regress the number of crashes, as it is only a nullpointer check. It backs out before the function is called that could lead to g_log_writer_default.

Equally https://phabricator.services.mozilla.com/D79728 cannot regress the number of crashes, as it just checks if integers are zero and also backs out of the function.

So if this should be backed out then all of them must go:
https://hg.mozilla.org/mozilla-central/rev?node=70c8e33b5710
https://hg.mozilla.org/mozilla-central/rev/00b1e2a78573e809ef86a70d72b9cb7253f0caea
https://hg.mozilla.org/mozilla-central/rev/b3d9f2fe4edd

I suspect that the patches are not the root cause but I cannot be sure as I cannot reproduce this and nobody seemed to report a bug for this.

Flags: needinfo?(wekerbugs)

So if this should be backed out then all of them must go:
https://hg.mozilla.org/mozilla-central/rev?node=70c8e33b5710

The other patches make sense as they're null checks, but this seems unrelated, or am I missing something?

(In reply to Gian-Carlo Pascutto [:gcp] from comment #20)

So if this should be backed out then all of them must go:
https://hg.mozilla.org/mozilla-central/rev?node=70c8e33b5710

The other patches make sense as they're null checks, but this seems unrelated, or am I missing something?

No this introduced the function, that invokes XGetWindowProperty which sometimes somehow leads to g_log_writer_default.
The other two patches introduce null checks and cannot regress the number of patches, therefore if something should be backed out, then all 3. That would of course regress Bug 1628749.

I think we should call GetWindowManagerName() only when XDG_CURRENT_DESKTOP is empty, which performs the check only for "suspected" desktops. We may also provide a preference to force disable the workspace restoration. I'll look at it this week.

(In reply to Martin Stránský [:stransky] from comment #22)

I think we should call GetWindowManagerName() only when XDG_CURRENT_DESKTOP is empty, which performs the check only for "suspected" desktops. We may also provide a preference to force disable the workspace restoration. I'll look at it this week.

As I've noted in other places XDG_CURRENT_DESKTOP is not for the WM but for the whole DE.
I think a preference to force disable the workspace restoration would then be a must.
Maybe detecting the WM wouldn't even be needed with this preference as it is rather ugly i.e. every "incompatible" WM must be added manually.

Pushed by apavel@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/1e218d34eaf3 [Linux/Gtk] Workspace restore - get window manager name only when XDG_CURRENT_DESKTOP is missing and check/ship widget.disable-workspace-management to disable it, r=jhorak
Status: REOPENED → RESOLVED
Closed: 4 years ago4 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla80

I haven't tested this, but doesn't the inner bool usesDynamicWorkspaces shadow the outer one now and therefore the mutter setting isn't respected? (I'm a bit surprised there's no compiler warning though if that's true...)

(In reply to Johannes Pfrang [:johnp] from comment #27)

I haven't tested this, but doesn't the inner bool usesDynamicWorkspaces shadow the outer one now and therefore the mutter setting isn't respected? (I'm a bit surprised there's no compiler warning though if that's true...)

Ahh, you're right. I'll fix that, Thanks.

Reports with this signature seem to be increasing again since July, with reports from all channels. Should we reopen this or file a new bug?

Flags: needinfo?(stransky)

Please file a new bug and attach the new signatures here.
Thanks.

Flags: needinfo?(stransky)
Has Regression Range: --- → yes
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: