Closed Bug 1509234 Opened 6 years ago Closed 5 years ago

Intermittent Mn | application crashed [@ mozilla::a11y::FocusManager::ProcessFocusEvent(mozilla::a11y::AccEvent*)]

Categories

(Core :: Disability Access APIs, defect, P3)

x86_64
macOS
defect

Tracking

()

RESOLVED FIXED
Tracking Status
firefox-esr60 --- unaffected
firefox-esr68 --- wontfix
firefox68 --- wontfix
firefox69 --- wontfix
firefox70 --- fixed

People

(Reporter: intermittent-bug-filer, Assigned: Jamie)

References

Details

(Keywords: crash, intermittent-failure)

Crash Data

Attachments

(3 files)

This looks like a null pointer crash with the address close to be 0x0: 00:41:54 INFO - Crash reason: EXC_BAD_ACCESS / KERN_INVALID_ADDRESS 00:41:54 INFO - Crash address: 0x120 Here the top 20 frames from the crashing thread: 00:41:54 INFO - Thread 0 (crashed) 00:41:54 INFO - 0 XUL!mozilla::a11y::FocusManager::ProcessFocusEvent(mozilla::a11y::AccEvent*) [nsCOMPtr.h:28497e7f30ae120d245f3898912fc845ccfe99f3 : 919 + 0x0] 00:41:54 INFO - rax = 0x97000ce6033eafe4 rdx = 0x00007fff9174e79d 00:41:54 INFO - rcx = 0x0000000000000000 rbx = 0x000000013496e340 00:41:54 INFO - rsi = 0x0000020600000032 rdi = 0x0000000000000000 00:41:54 INFO - rbp = 0x00007fff52621c80 rsp = 0x00007fff52621c30 00:41:54 INFO - r8 = 0x00007fff9174e67d r9 = 0x00007fff9174e67e 00:41:54 INFO - r10 = 0x00007f948401ed20 r11 = 0x00007fff7c630740 00:41:54 INFO - r12 = 0x00000001239b0040 r13 = 0x00000001239b0040 00:41:54 INFO - r14 = 0x000000013496e300 r15 = 0x0000000000000000 00:41:54 INFO - rip = 0x000000011290fab9 00:41:54 INFO - Found by: given as instruction pointer in context 00:41:54 INFO - 1 XUL!mozilla::a11y::EventQueue::ProcessEventQueue() [EventQueue.cpp:28497e7f30ae120d245f3898912fc845ccfe99f3 : 308 + 0xb] 00:41:54 INFO - rbp = 0x00007fff52621cc0 rsp = 0x00007fff52621c90 00:41:54 INFO - rip = 0x000000011290f0a2 00:41:54 INFO - Found by: previous frame's frame pointer 00:41:54 INFO - 2 XUL!mozilla::a11y::NotificationController::WillRefresh(mozilla::TimeStamp) [NotificationController.cpp:28497e7f30ae120d245f3898912fc845ccfe99f3 : 909 + 0x8] 00:41:54 INFO - rbp = 0x00007fff52621ea0 rsp = 0x00007fff52621cd0 00:41:54 INFO - rip = 0x00000001129191d0 00:41:54 INFO - Found by: previous frame's frame pointer 00:41:54 INFO - 3 XUL!nsRefreshDriver::Tick(mozilla::TimeStamp) [nsRefreshDriver.cpp:28497e7f30ae120d245f3898912fc845ccfe99f3 : 1878 + 0x9] 00:41:54 INFO - rbp = 0x00007fff52622150 rsp = 0x00007fff52621eb0 00:41:54 INFO - rip = 0x00000001116e4bb8 00:41:54 INFO - Found by: previous frame's frame pointer 00:41:54 INFO - 4 XUL!mozilla::RefreshDriverTimer::TickRefreshDrivers(mozilla::TimeStamp, nsTArray<RefPtr<nsRefreshDriver> >&) [nsRefreshDriver.cpp:28497e7f30ae120d245f3898912fc845ccfe99f3 : 326 + 0x8] 00:41:54 INFO - rbp = 0x00007fff52622180 rsp = 0x00007fff52622160 00:41:54 INFO - rip = 0x00000001116ec573 00:41:54 INFO - Found by: previous frame's frame pointer 00:41:54 INFO - 5 XUL!mozilla::RefreshDriverTimer::Tick(mozilla::TimeStamp) [nsRefreshDriver.cpp:28497e7f30ae120d245f3898912fc845ccfe99f3 : 319 + 0xb] 00:41:54 INFO - rbp = 0x00007fff526221f0 rsp = 0x00007fff52622190 00:41:54 INFO - rip = 0x00000001116ec419 00:41:54 INFO - Found by: previous frame's frame pointer 00:41:54 INFO - 6 XUL!mozilla::VsyncRefreshDriverTimer::RefreshDriverVsyncObserver::TickRefreshDriver(mozilla::TimeStamp) [nsRefreshDriver.cpp:28497e7f30ae120d245f3898912fc845ccfe99f3 : 760 + 0xb] 00:41:54 INFO - rbp = 0x00007fff52622230 rsp = 0x00007fff52622200 00:41:54 INFO - rip = 0x00000001116edbe7 00:41:54 INFO - Found by: previous frame's frame pointer 00:41:54 INFO - 7 XUL!mozilla::VsyncRefreshDriverTimer::RefreshDriverVsyncObserver::ParentProcessVsyncNotifier::Run() [nsRefreshDriver.cpp:28497e7f30ae120d245f3898912fc845ccfe99f3 : 514 + 0x5] 00:41:54 INFO - rbp = 0x00007fff52622270 rsp = 0x00007fff52622240 00:41:54 INFO - rip = 0x00000001116eb25b 00:41:54 INFO - Found by: previous frame's frame pointer 00:41:54 INFO - 8 XUL!nsThread::ProcessNextEvent(bool, bool*) [nsThread.cpp:28497e7f30ae120d245f3898912fc845ccfe99f3 : 1244 + 0x6] 00:41:54 INFO - rbp = 0x00007fff526227d0 rsp = 0x00007fff52622280 00:41:54 INFO - rip = 0x000000010dcf849c 00:41:54 INFO - Found by: previous frame's frame pointer 00:41:54 INFO - 9 XUL!NS_ProcessPendingEvents(nsIThread*, unsigned int) [nsThreadUtils.cpp:28497e7f30ae120d245f3898912fc845ccfe99f3 : 472 + 0xf] 00:41:54 INFO - rbp = 0x00007fff52622810 rsp = 0x00007fff526227e0 00:41:54 INFO - rip = 0x000000010dcf5951 00:41:54 INFO - Found by: previous frame's frame pointer 00:41:54 INFO - 10 XUL!nsBaseAppShell::NativeEventCallback() [nsBaseAppShell.cpp:28497e7f30ae120d245f3898912fc845ccfe99f3 : 99 + 0xa] 00:41:54 INFO - rbp = 0x00007fff52622840 rsp = 0x00007fff52622820 00:41:54 INFO - rip = 0x000000011146512f 00:41:54 INFO - Found by: previous frame's frame pointer 00:41:54 INFO - 11 XUL!nsAppShell::ProcessGeckoEvents(void*) [nsAppShell.mm:28497e7f30ae120d245f3898912fc845ccfe99f3 : 463 + 0x8] 00:41:54 INFO - rbp = 0x00007fff526228a0 rsp = 0x00007fff52622850 00:41:54 INFO - rip = 0x00000001114d621b 00:41:54 INFO - Found by: previous frame's frame pointer 00:41:54 INFO - 12 CoreFoundation + 0x80a01 00:41:54 INFO - rbp = 0x00007fff526228b0 rsp = 0x00007fff526228b0 00:41:54 INFO - rip = 0x00007fff958daa01 00:41:54 INFO - Found by: previous frame's frame pointer 00:41:54 INFO - 13 CoreFoundation + 0x72b8d 00:41:54 INFO - rbp = 0x00007fff52622910 rsp = 0x00007fff526228c0 00:41:54 INFO - rip = 0x00007fff958ccb8d 00:41:54 INFO - Found by: previous frame's frame pointer 00:41:54 INFO - 14 CoreFoundation + 0x721bf 00:41:54 INFO - rbp = 0x00007fff526235f0 rsp = 0x00007fff52622920 00:41:54 INFO - rip = 0x00007fff958cc1bf 00:41:54 INFO - Found by: previous frame's frame pointer 00:41:54 INFO - 15 CoreFoundation + 0x71bd8 00:41:54 INFO - rbp = 0x00007fff52623650 rsp = 0x00007fff52623600 00:41:54 INFO - rip = 0x00007fff958cbbd8 00:41:54 INFO - Found by: previous frame's frame pointer 00:41:54 INFO - 16 HIToolbox + 0x3256f 00:41:54 INFO - rbp = 0x00007fff52623690 rsp = 0x00007fff52623660 00:41:54 INFO - rip = 0x00007fff8d90b56f 00:41:54 INFO - Found by: previous frame's frame pointer 00:41:54 INFO - 17 HIToolbox + 0x322ea 00:41:54 INFO - rbp = 0x00007fff52623710 rsp = 0x00007fff526236a0 00:41:54 INFO - rip = 0x00007fff8d90b2ea 00:41:54 INFO - Found by: previous frame's frame pointer 00:41:54 INFO - 18 HIToolbox + 0x3212b 00:41:54 INFO - rbp = 0x00007fff52623730 rsp = 0x00007fff52623720 00:41:54 INFO - rip = 0x00007fff8d90b12b 00:41:54 INFO - Found by: previous frame's frame pointer 00:41:54 INFO - 19 AppKit + 0x918ab 00:41:54 INFO - rbp = 0x00007fff52623ba0 rsp = 0x00007fff52623740 00:41:54 INFO - rip = 0x00007fff90d708ab 00:41:54 INFO - Found by: previous frame's frame pointer 00:41:54 INFO - 20 AppKit + 0x90e58 00:41:54 INFO - rbp = 0x00007fff52623e40 rsp = 0x00007fff52623bb0 00:41:54 INFO - rip = 0x00007fff90d6fe58 00:41:54 INFO - Found by: previous frame's frame pointer
Note that there is an accessible assertion a couple of seconds before which could be related: https://treeherder.mozilla.org/logviewer.html#?job_id=213088168&repo=try&lineNumber=7395-7397 > 00:41:34 INFO - 2018-11-21 00:41:34.765 firefox[703:5872] *** Assertion failure in -[mozRootAccessible representedView], /builds/worker/workspace/build/src/accessible/mac/mozDocAccessible.mm:100 > 00:41:34 INFO - 2018-11-21 00:41:34.766 firefox[703:5872] Mozilla has caught an Obj-C exception [NSInternalInconsistencyException: can't return root accessible's native parallel view.] > 00:41:34 INFO - 2018-11-21 00:41:34.766 firefox[703:5872] Generating stack trace for Obj-C exception... The crash will happen in some of the Marionette unit tests with the patch on bug 1507803 applied. Please also note that Marionette enables the focusmanager test mode.
Maybe this is related to immediately focusing the newly opened window: > // Open a window, wait for it to receive focus > let win = window.openDialog(url, null, "chrome,centerscreen"); > > // Bug 1507803 - Missing focus/activate event when tests are > // run in the background. Focus manually for now. > win.focus(); Yura, may you be able to help us with that?
Flags: needinfo?(yzenevich)
Delaying the call to `focus()` indeed helps here and removes the crash, but also increases the number of failures as seen for bug 1504201: https://treeherder.mozilla.org/#/jobs?repo=try&revision=1b6cddedb7dd15690224cf1c1594ad6f9caf87b9
OS: Unspecified → Mac OS X
Hardware: Unspecified → x86_64
Hmm, this is an area I'm not familiar with.. My suspicion it has something to do with https://searchfox.org/mozilla-central/source/widget/cocoa/nsCocoaWindow.mm#3450-3499 that results in the call to mozDocAccessible@representedView. Is there some sort of "ready" event for the window that is being open before you call focus (something like "load") to see if focusing after helps mitigate this issue?
Flags: needinfo?(yzenevich)
Good idea. There is indeed a `load` event which I might have to wait for. Lets see if this is successful: https://treeherder.mozilla.org/#/jobs?repo=try&revision=c4171ac0e71075f02ed2364ef33dd755d47711be
Waiting for the load event doesn't help, but a `setTimeout()` to wait for the next tick seems to do the trick: https://treeherder.mozilla.org/#/jobs?repo=try&revision=bd595c43f7ff5fb922f40d3ddfe44508aee6e319&selectedJob=213431098&searchStr=debug

Is there someone who could have a look at this crash? The patch with the workaround just landed today, and it would be great when I could get rid of it.

Flags: needinfo?(surkov.alexander)

let win = window.openDialog(url, null, "chrome,centerscreen");
win.focus();

a11y certainly should handle this case fine.

(In reply to Henrik Skupin (:whimboo) [⌚️UTC+1] from comment #1)

This looks like a null pointer crash with the address close to be 0x0:

00:41:54 INFO - Crash reason: EXC_BAD_ACCESS / KERN_INVALID_ADDRESS
00:41:54 INFO - Crash address: 0x120

Here the top 20 frames from the crashing thread:

00:41:54 INFO - Thread 0 (crashed)
00:41:54 INFO - 0
XUL!mozilla::a11y::FocusManager::ProcessFocusEvent(mozilla::a11y::AccEvent*)
[nsCOMPtr.h:28497e7f30ae120d245f3898912fc845ccfe99f3 : 919 + 0x0]

which line exactly it crashes?

Flags: needinfo?(surkov.alexander)

It is the call to win.focus().

(In reply to Henrik Skupin (:whimboo) [⌚️UTC+1] from comment #14)

It is the call to win.focus().

I meant a line in FocusManager::ProcessFocusEvent where it crashes, the stack refers to a line in nsCOMPtr.h file.

Oh, I see. So where-ever this call goes to:
https://searchfox.org/mozilla-central/rev/c035ee7d3a5cd6913e7143e1bce549ffb4a566ff/accessible/base/EventQueue.cpp#289

I tried to reproduce it locally but without success. Maybe it got fixed meanwhile.

To verify that I pushed a try build:
https://treeherder.mozilla.org/#/jobs?repo=try&revision=93aad6c05f4feca14f4d1b37671f36b73b681732

There was actually a new failure earlier today as reported via bug 1522447. It also shows an assertion before. Here the full output from the log:

02:56:47 INFO - 1548327407116 Marionette DEBUG 200 -> [0,11,"WebDriver:NewWindow",{"type":"window","focus":false}]
02:56:47 INFO - ++DOCSHELL 0x124913000 == 5 [pid = 1889] [id = {63690d04-d8a1-5147-bb6b-4f6b43d58363}]
02:56:47 INFO - ++DOMWINDOW == 9 (0x12fcaf000) [pid = 1889] [serial = 10] [outer = 0x0]
02:56:47 INFO - ++DOMWINDOW == 10 (0x12ff72800) [pid = 1889] [serial = 11] [outer = 0x12fcaf000]
02:56:47 INFO - [Child 1896, Main Thread] WARNING: '!gThread', file /builds/worker/workspace/build/src/xpcom/threads/nsTimerImpl.cpp, line 299
02:56:47 INFO - [Child 1896, Main Thread] WARNING: '!gThread', file /builds/worker/workspace/build/src/xpcom/threads/nsTimerImpl.cpp, line 299
02:56:47 INFO - [Child 1896, Main Thread] WARNING: '!gThread', file /builds/worker/workspace/build/src/xpcom/threads/nsTimerImpl.cpp, line 299
02:56:47 INFO - [Child 1896, Main Thread] WARNING: '!gThread', file /builds/worker/workspace/build/src/xpcom/threads/nsTimerImpl.cpp, line 299
02:56:47 INFO - 1548327407153 Marionette TRACE Received DOM event activate for [object ChromeWindow]
02:56:47 INFO - 1548327407155 Marionette TRACE Received DOM event focus for [object HTMLDocument]
02:56:47 INFO - 2019-01-24 02:56:47.158 firefox[1889:12870] *** Assertion failure in -[mozRootAccessible representedView], /builds/worker/workspace/build/src/accessible/mac/mozDocAccessible.mm:88
02:56:47 INFO - 2019-01-24 02:56:47.158 firefox[1889:12870] Mozilla has caught an Obj-C exception [NSInternalInconsistencyException: can't return root accessible's native parallel view.]
02:56:47 INFO - 2019-01-24 02:56:47.159 firefox[1889:12870] Generating stack trace for Obj-C exception...

(In reply to Henrik Skupin (:whimboo) [⌚️UTC+1] from comment #16)

To verify that I pushed a try build:
https://treeherder.mozilla.org/#/jobs?repo=try&revision=93aad6c05f4feca14f4d1b37671f36b73b681732

The crash rate is still high. You can see two of those crashes on that try build. I hope that helps you.

Flags: needinfo?(surkov.alexander)

(In reply to Henrik Skupin (:whimboo) [⌚️UTC+1] from comment #17)

There was actually a new failure earlier today as reported via bug 1522447. It also shows an assertion before. Here the full output from the log:

02:56:47 INFO - 1548327407153 Marionette TRACE Received DOM event activate for [object ChromeWindow]
02:56:47 INFO - 1548327407155 Marionette TRACE Received DOM event focus for [object HTMLDocument]
02:56:47 INFO - 2019-01-24 02:56:47.158 firefox[1889:12870] *** Assertion failure in -[mozRootAccessible representedView], /builds/worker/workspace/build/src/accessible/mac/mozDocAccessible.mm:88
02:56:47 INFO - 2019-01-24 02:56:47.158 firefox[1889:12870] Mozilla has caught an Obj-C exception [NSInternalInconsistencyException: can't return root accessible's native parallel view.]
02:56:47 INFO - 2019-01-24 02:56:47.159 firefox[1889:12870] Generating stack trace for Obj-C exception...

Is it correct assumption that the assertion is on the crash stack below?

Thread 0 (crashed)
00:14:02 INFO - 0 XUL!mozilla::a11y::FocusManager::ProcessFocusEvent(mozilla::a11y::AccEvent*) [nsCOMPtr.h:956bd26eec5a9174753b63931d1a927a59cd4716 : 823 + 0x0]
00:14:02 INFO - rax = 0x1800ace25e42b5e6 rdx = 0x00007fff9311679d
00:14:02 INFO - rcx = 0x0000000000000000 rbx = 0x0000000121713b50
00:14:02 INFO - rsi = 0x0000020600000032 rdi = 0x0000000000000000
00:14:02 INFO - rbp = 0x00007fff5d0acbb0 rsp = 0x00007fff5d0acb60
00:14:02 INFO - r8 = 0x00007fff9311667d r9 = 0x00007fff9311667e
00:14:02 INFO - r10 = 0x00007fa1ba824920 r11 = 0x00007fff79809740
00:14:02 INFO - r12 = 0x0000000120c52900 r13 = 0x0000000120c52900
00:14:02 INFO - r14 = 0x0000000121713700 r15 = 0x0000000000000000
00:14:02 INFO - rip = 0x0000000108395aa9
00:14:02 INFO - Found by: given as instruction pointer in context
00:14:02 INFO - 1 XUL!mozilla::a11y::EventQueue::ProcessEventQueue() [EventQueue.cpp:956bd26eec5a9174753b63931d1a927a59cd4716 : 289 + 0xb]
00:14:02 INFO - rbp = 0x00007fff5d0acc00 rsp = 0x00007fff5d0acbc0
00:14:02 INFO - rip = 0x0000000108395042

Is it possible to extract line in FocusManager::ProcessFocusEvent from the stack above? Having an actual crash line in that function might explain what happens here.

Flags: needinfo?(surkov.alexander)

(In reply to alexander :surkov (:asurkov) from comment #19)

Is it possible to extract line in FocusManager::ProcessFocusEvent from the stack above? Having an actual crash line in that function might explain what happens here.

I only see those crashes via Treeherder and jobs as run via TaskCluster but not locally. So there is nothing I could do here.

Also in regards of the line of code there is nothing I can add more than I already did in comment 16. I'm not a developer so you are way more familiar with that code.

Flags: needinfo?(surkov.alexander)

Marco, is there anything you can help us with?

Flags: needinfo?(mzehe)

Hm, without the actual thing failing, like Surkov asked, it's mostly guess work. My current guess is that the call to FocusMgr() returns nullptr, which would indicate that the accessibility service isn't ready yet, since FocusMgr() in this case is a synonym for the global reference to the AccessibilityService instance. But the fact that the stack doesn't really indicate this explicitly is worrying. Also that it only seems to happen on Mac, and only on TaskCluster. The fact that a timeout you input fixes the issue, AKA gives the service enough time to instanciate, seems to confirm this hypothesis.

Flags: needinfo?(mzehe)

Moving these bugs (intermittent test failures with crashes) out of P5.

Priority: P5 → --

(In reply to Marco Zehe (:MarcoZ) [Out sick until further notice, NI and reviews will not be answered] from comment #25)

Hm, without the actual thing failing, like Surkov asked, it's mostly guess work. My current guess is that the call to FocusMgr() returns nullptr, which would indicate that the accessibility service isn't ready yet, since FocusMgr() in this case is a synonym for the global reference to the AccessibilityService instance. But the fact that the stack doesn't really indicate this explicitly is worrying. Also that it only seems to happen on Mac, and only on TaskCluster. The fact that a timeout you input fixes the issue, AKA gives the service enough time to instanciate, seems to confirm this hypothesis.

it'd be good to check the hypothesis by adding an assertion and moving out a workaround from comment #7

Flags: needinfo?(surkov.alexander)

You would have to apply:
https://hg.mozilla.org/try/rev/8bcb0876973bdeb6146944dc493d0d093f9bc7d1

Then as the try build from comment 16 shows it is very easy to reproduce. Alexander or Marco, could one of you do that?

Flags: needinfo?(surkov.alexander)

I run a patch [1] via try server marionette tests [2], no failures. Henrik, could you double check my observations, did I ported your patch correctly (it didn't apply cleanly)? If so, then let's close it as worksforme.

[1] https://hg.mozilla.org/try/rev/fd7d1dff65f5fe48dd7551d7687bc6f2e69df969
[2] https://treeherder.mozilla.org/#/jobs?repo=try&revision=496bfe760dd533f9d9662a63c99a59bc343774f1

Flags: needinfo?(surkov.alexander) → needinfo?(hskupin)

It is but it uses full builds instead of artifact builds. To be completely sure I pushed with my config:

https://treeherder.mozilla.org/#/jobs?repo=try&revision=f406536a7e9fea29734f6be83cb08544524da344

If those tests are green we can indeed get the bug closed. I will file a follow-up for removing this code.

Flags: needinfo?(hskupin)

Please note that Marionette tests now run on MacOS 10.14 and not 10.10 anymore. Maybe this made the crash to go away...

The crash indeed only happens on MacOS 10.10 but not 10.14. If you think that we shouldn't fix it even we still support 10.10 lets mark this bug as wontfix.

Flags: needinfo?(surkov.alexander)
No longer blocks: 1522447

(In reply to Henrik Skupin (:whimboo) [⌚️UTC+2] (away July 6th - July 19th) from comment #34)

The crash indeed only happens on MacOS 10.10 but not 10.14. If you think that we shouldn't fix it even we still support 10.10 lets mark this bug as wontfix.

Technically 10.10 os x crash indicates the a11y focus handling code has a flaw that may be triggerred under certain conditions. So it'd be better to fix the issue, however I'd say it has to be Jamie's call whether it's worth it or not to investigate it further.

Flags: needinfo?(surkov.alexander)

(In reply to alexander :surkov (:asurkov) from comment #36)

(In reply to Henrik Skupin (:whimboo) [⌚️UTC+2] (away July 6th - July 19th) from comment #34)

The crash indeed only happens on MacOS 10.10 but not 10.14. If you think that we shouldn't fix it even we still support 10.10 lets mark this bug as wontfix.

Technically 10.10 os x crash indicates the a11y focus handling code has a flaw that may be triggerred under certain conditions. So it'd be better to fix the issue, however I'd say it has to be Jamie's call whether it's worth it or not to investigate it further.

So lets get the needinfo set for him. Please also note that this only happens for debug builds.

Flags: needinfo?(jteh)

I agree we ideally should investigate and fix this, but given that this is Mac debug only, doesn't really impact anything any more and we have no one who is able to work on mac a11y, I'm wontfixing this. We can always reopen if it becomes a real problem again.

Status: NEW → RESOLVED
Closed: 5 years ago
Flags: needinfo?(jteh)
Resolution: --- → WONTFIX

Ok, then I will remove the workaround for Marionette tests which run on 10.14, so that we have this code path tested. Thanks

Since I landed my patch for bug 1563040 the crash revealed even with MacOS 10.14! So it isn't a 10.10 only issue. See the list of failures from last week:

https://treeherder.mozilla.org/intermittent-failures.html#/bugdetails?bug=1565391&startday=2019-07-15&endday=2019-07-21&tree=all

I will backout my patch for now, and we have to wait for a fix on this bug.

Could someone follow-up on comment 27 and add such an assertion?

Status: RESOLVED → REOPENED
Flags: needinfo?(jteh)
Resolution: WONTFIX → ---

I'm not sure what changed (possibly even an improvement in the toolchain), but the recent stacks show the first frame of the crash in nsFocusManager.cpp:

22:54:58 INFO - 0 XUL!mozilla::a11y::FocusManager::ProcessFocusEvent(mozilla::a11y::AccEvent*) [FocusManager.cpp:5e0d7f8b5620cbe84280e877b4d14c442925d1ba : 353 + 0x0]

Reading the comments, I was already starting to suspect this couldn't be a null pointer return from FocusMgr() - FocusMgr() returns a raw pointer, not a smart pointer (as inferred by the frame being in nsCOMPtr.h). However, the new crash point seems more realistic.

That said, the new point refers to this line:

352       DocAccessible* targetDocument = target->Document();
353       Accessible* anchorJump = targetDocument->AnchorJump();

That then suggests that target->Document() returns null, which should be impossible:

  1. Accessible::Document() just returns mDoc.
  2. The only time mDoc can be null is if the accessible is shut down.
  3. If the accessible is shut down, it should be defunct.
  4. If the accessible is defunct, we wouldn't have called ProcessFocusEvent() in the first place, since EventQueue::ProcessEventQueue() checks IsDefunct() before calling ProcessFocusEvent.

The only thing I can think is that the accessible somehow becomes defunct during the call to ProcessFocusEvent, perhaps during nsEventShell::FireEvent. I'll throw in some assertions and see what we get.

Flags: needinfo?(jteh)
Keywords: leave-open
Pushed by jteh@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/f61413f491f3 Add some assertions to help debug crashes. r=yzen

There are a couple of failing jobs now with indeed a defunct assertion:

https://treeherder.mozilla.org/logviewer.html#/jobs?job_id=258263964&repo=try&lineNumber=15845-15874

[task 2019-07-25T09:28:35.841Z] 09:28:35 INFO - Assertion failure: !target->IsDefunct(), at /builds/worker/workspace/build/src/accessible/base/FocusManager.cpp:354
[task 2019-07-25T09:28:35.841Z] 09:28:35 INFO - #01: WebPGetColorPalette[/Users/cltbld/tasks/task_1564046560/build/application/Firefox NightlyDebug.app/Contents/MacOS/XUL +0x513cc52]
[task 2019-07-25T09:28:35.841Z] 09:28:35 INFO - #02: WebPGetColorPalette[/Users/cltbld/tasks/task_1564046560/build/application/Firefox NightlyDebug.app/Contents/MacOS/XUL +0x5147359]
[task 2019-07-25T09:28:35.842Z] 09:28:35 INFO - #03: mac_plugin_interposing_child_OnShowCursor[/Users/cltbld/tasks/task_1564046560/build/application/Firefox NightlyDebug.app/Contents/MacOS/XUL +0x402e8cc]
[task 2019-07-25T09:28:35.842Z] 09:28:35 INFO - #04: mac_plugin_interposing_child_OnShowCursor[/Users/cltbld/tasks/task_1564046560/build/application/Firefox NightlyDebug.app/Contents/MacOS/XUL +0x4036356]
[task 2019-07-25T09:28:35.842Z] 09:28:35 INFO - #05: mac_plugin_interposing_child_OnShowCursor[/Users/cltbld/tasks/task_1564046560/build/application/Firefox NightlyDebug.app/Contents/MacOS/XUL +0x40361bb]
[task 2019-07-25T09:28:35.842Z] 09:28:35 INFO - #06: mac_plugin_interposing_child_OnShowCursor[/Users/cltbld/tasks/task_1564046560/build/application/Firefox NightlyDebug.app/Contents/MacOS/XUL +0x4037f02]
[task 2019-07-25T09:28:35.842Z] 09:28:35 INFO - #07: mac_plugin_interposing_child_OnShowCursor[/Users/cltbld/tasks/task_1564046560/build/application/Firefox NightlyDebug.app/Contents/MacOS/XUL +0x4034f9d]
[task 2019-07-25T09:28:35.842Z] 09:28:35 INFO - #08: NS_NewLocalFileWithCFURL[/Users/cltbld/tasks/task_1564046560/build/application/Firefox NightlyDebug.app/Contents/MacOS/XUL +0x1582c8]
[task 2019-07-25T09:28:35.842Z] 09:28:35 INFO - #09: NS_NewLocalFileWithCFURL[/Users/cltbld/tasks/task_1564046560/build/application/Firefox NightlyDebug.app/Contents/MacOS/XUL +0x155061]
[task 2019-07-25T09:28:35.842Z] 09:28:35 INFO - #10: mac_plugin_interposing_child_OnShowCursor[/Users/cltbld/tasks/task_1564046560/build/application/Firefox NightlyDebug.app/Contents/MacOS/XUL +0x3d98d4f]
[task 2019-07-25T09:28:35.842Z] 09:28:35 INFO - #11: mac_plugin_interposing_child_OnShowCursor[/Users/cltbld/tasks/task_1564046560/build/application/Firefox NightlyDebug.app/Contents/MacOS/XUL +0x3e0e3b9]
[task 2019-07-25T09:28:35.843Z] 09:28:35 INFO - #12: CFRUNLOOP_IS_CALLING_OUT_TO_A_SOURCE0_PERFORM_FUNCTION[/System/Library/Frameworks/CoreFoundation.framework/Versions/A/CoreFoundation +0x58083]
[task 2019-07-25T09:28:35.843Z] 09:28:35 INFO - #13: __CFRunLoopDoSource0[/System/Library/Frameworks/CoreFoundation.framework/Versions/A/CoreFoundation +0x58029]
[task 2019-07-25T09:28:35.843Z] 09:28:35 INFO - #14: __CFRunLoopDoSources0[/System/Library/Frameworks/CoreFoundation.framework/Versions/A/CoreFoundation +0x3b9eb]
[task 2019-07-25T09:28:35.843Z] 09:28:35 INFO - #15: __CFRunLoopRun[/System/Library/Frameworks/CoreFoundation.framework/Versions/A/CoreFoundation +0x3afb5]
[task 2019-07-25T09:28:35.843Z] 09:28:35 INFO - #16: CFRunLoopRunSpecific[/System/Library/Frameworks/CoreFoundation.framework/Versions/A/CoreFoundation +0x3a8be]
[task 2019-07-25T09:28:35.843Z] 09:28:35 INFO - #17: RunCurrentEventLoopInMode[/System/Library/Frameworks/Carbon.framework/Versions/A/Frameworks/HIToolbox.framework/Versions/A/HIToolbox +0xa96b]
[task 2019-07-25T09:28:35.843Z] 09:28:35 INFO - #18: ReceiveNextEventCommon[/System/Library/Frameworks/Carbon.framework/Versions/A/Frameworks/HIToolbox.framework/Versions/A/HIToolbox +0xa6a5]
[task 2019-07-25T09:28:35.843Z] 09:28:35 INFO - #19: _BlockUntilNextEventMatchingListInModeWithFilter[/System/Library/Frameworks/Carbon.framework/Versions/A/Frameworks/HIToolbox.framework/Versions/A/HIToolbox +0xa436]
[task 2019-07-25T09:28:35.843Z] 09:28:35 INFO - #20: _DPSNextEvent[/System/Library/Frameworks/AppKit.framework/Versions/C/AppKit +0x1a987]
[task 2019-07-25T09:28:35.844Z] 09:28:35 INFO - #21: -[NSApplication(NSEvent) _nextEventMatchingEventMask:untilDate:inMode:dequeue:][/System/Library/Frameworks/AppKit.framework/Versions/C/AppKit +0x1971f]
[task 2019-07-25T09:28:35.844Z] 09:28:35 INFO - #22: mac_plugin_interposing_child_OnShowCursor[/Users/cltbld/tasks/task_1564046560/build/application/Firefox NightlyDebug.app/Contents/MacOS/XUL +0x3e0d37e]
[task 2019-07-25T09:28:35.844Z] 09:28:35 INFO - #23: -[NSApplication run][/System/Library/Frameworks/AppKit.framework/Versions/C/AppKit +0x1383c]
[task 2019-07-25T09:28:35.844Z] 09:28:35 INFO - #24: mac_plugin_interposing_child_OnShowCursor[/Users/cltbld/tasks/task_1564046560/build/application/Firefox NightlyDebug.app/Contents/MacOS/XUL +0x3e0ed4f]
[task 2019-07-25T09:28:35.844Z] 09:28:35 INFO - #25: workerlz4_maxCompressedSize[/Users/cltbld/tasks/task_1564046560/build/application/Firefox NightlyDebug.app/Contents/MacOS/XUL +0x548bc6e]
[task 2019-07-25T09:28:35.844Z] 09:28:35 INFO - #26: RecordReplayInterface_DefineRecordReplayControlObject[/Users/cltbld/tasks/task_1564046560/build/application/Firefox NightlyDebug.app/Contents/MacOS/XUL +0x562060c]
[task 2019-07-25T09:28:35.844Z] 09:28:35 INFO - #27: RecordReplayInterface_DefineRecordReplayControlObject[/Users/cltbld/tasks/task_1564046560/build/application/Firefox NightlyDebug.app/Contents/MacOS/XUL +0x56218ca]
[task 2019-07-25T09:28:35.844Z] 09:28:35 INFO - #28: RecordReplayInterface_DefineRecordReplayControlObject[/Users/cltbld/tasks/task_1564046560/build/application/Firefox NightlyDebug.app/Contents/MacOS/XUL +0x562233c]

I hope that this already helps given that Marionette fails in symbolicating the assertion stack. See bug 1348961.

Flags: needinfo?(jteh)

It seems an Accessible can be shut down during a call to nsEventShell::FireEvent. I'm not really sure how or why that happens, but it's going to be pretty difficult to debug given the circumstances under which this occurs, so I'm just going to check for it and return early.

Assignee: nobody → jteh
Status: REOPENED → ASSIGNED
Flags: needinfo?(jteh)

In FocusManager::ProcessFocusEvent, after firing the event, we get the anchor jump from the target Accessible's document.
However, it seems the Accessible can be shut down during the call to nsEventShell::FireEvent, resulting in a crash when we try to get the anchor jump.
Protect against this by checking whether the target is defunct after firing the event.

Blocks: 1569193
Priority: -- → P3
Pushed by jteh@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/f869dc3ce4a9 Return early if accessible dies while processing a focus event to prevent crashes. r=eeejay

This is fixed now. James, do you think that we should backport? If yes, only beta or also 68ESR?

Flags: needinfo?(jteh)
Keywords: leave-open
Status: ASSIGNED → RESOLVED
Closed: 5 years ago5 years ago
Resolution: --- → FIXED

(In reply to Henrik Skupin (:whimboo) [⌚️UTC+2] from comment #56)

This is fixed now. James, do you think that we should backport? If yes, only beta or also 68ESR?

I don't see any crashes with this signature going back a couple of months, so I think (for now at least) that it's limited to tests. Thus, I'm not sure it's worth uplifting this (unless this is causing problems for tests on beta and this is an issue that you need fixed).

Flags: needinfo?(jteh)

No, I'm fine with the workaround we have on beta and ESR68. In that case lets not worry about uplifts. Thanks for checking crashstats through!

Hi Henrik. Is it possible that this fixed the noise between 20:00 and 4:00 in the attached graph? The highlighted datapoint is where this fix landed.
Impropperly said fixed, the noise between this interval was 2k - 2.6k and after this landed the noise is 2.3k - 2.6k

Flags: needinfo?(hskupin)

I doubt so, but please ask the assignee of this bug.

Flags: needinfo?(hskupin) → needinfo?(jteh)

I can't see this graph (I'm totally blind and this is a screen shot, even assuming the graph was accessible in the first place which is unlikely) and thus I have no idea what's being talked about here.

Flags: needinfo?(jteh)

Oh, sorry. I didn't pay attention that this was a screenshot. Alexandru, can you please help out with a real graph? Hopefully perfherder is accessible and would allow James to get the relevant information.

Flags: needinfo?(alexandru.ionescu)

(In reply to Henrik Skupin (:whimboo) [⌚️UTC+2] from comment #63)

Oh, sorry. I didn't pay attention that this was a screenshot. Alexandru, can you please help out with a real graph? Hopefully perfherder is accessible and would allow James to get the relevant information.

Here is the graph

Flags: needinfo?(alexandru.ionescu)

Unfortunately, this graph is inaccessible to me as a screen reader user. I'm not really sure what is meant by "noise" here. How is that measured? Are we talking about some test running faster or similar?

In any case, I doubt this change fixed any performance issues. Before any patches, we were crashing. After my first patch to add assertions, we would have seen assertions and then a crash. Now, both the assertions and the crash are gone.

The graphs are from Raptor test jobs which measure page load times. In this case specifically for Instagram. The spike for the load time which started between July 31st and Aug 1st kept at a higher level until Aug 5th, then values dropped back to normal levels. I agree with James, and second what I already mentioned in comment 61 that this should be unrelated.

Also it looks like that perfherder needs a lot of love for accessibility.

You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: