IOUtils.exists causes intermittent crashes in Talos' sessionrestore_no_auto_restore test when called via XPIInstall.jsm
Categories
(Toolkit Graveyard :: OS.File, defect, P2)
Tracking
(firefox97 fixed)
Tracking | Status | |
---|---|---|
firefox97 | --- | fixed |
People
(Reporter: standard8, Unassigned)
References
Details
(Whiteboard: [fixed by bug 1744794])
Bug 1733540 tried to replace OS.File usage in XPIInstall.jsm with IOUtils. When that landed, it intermittently caused crashes (bug 1736189) in Talos' sessionrestore_no_auto_restore tests.
I have tried simplifying the patch to handle just the change needed to drop OS.File from the early startup process (prior to the first browser window being drawn). I tested simply replacing the one instance of the OS.File.exists
call that startup currently hits: https://hg.mozilla.org/try/rev/d6432cc4ab48f4ed37e8f13b9b1685165db30fa7
That replacement still caused crashes.
On Linux, the crash is a SIGABRT, and the top frames are:
libxul.so!PollWrapper(_GPollFD*, unsigned int, int) [nsAppShell.cpp:a4f4332a57e88004402b6a9a7f401c0fa8e078e9 : 61 + 0xf]
libglib-2.0.so.0!g_main_context_iterate.isra.26 [gmain.c : 3897 + 0x24]
libglib-2.0.so.0!g_main_context_iteration [gmain.c : 3963 + 0x14]
libxul.so!nsThread::ProcessNextEvent(bool, bool*) [nsThread.cpp:a4f4332a57e88004402b6a9a7f401c0fa8e078e9 : 1091 + 0x18d]
libxul.so!mozilla::ipc::MessagePump::Run(base::MessagePump::Delegate*) [MessagePump.cpp:a4f4332a57e88004402b6a9a7f401c0fa8e078e9 : 107 + 0x2e]
On Windows, it says "No crash". Though the top frames are:
win32u.dll + 0x96e4
user32.dll + 0x2029d
xul.dll!static mozilla::widget::WinUtils::WaitForMessage(unsigned long) [WinUtils.cpp:a4f4332a57e88004402b6a9a7f401c0fa8e078e9 : 832 + 0x1d]
xul.dll!nsAppShell::ProcessNextNativeEvent(bool) [nsAppShell.cpp:a4f4332a57e88004402b6a9a7f401c0fa8e078e9 : 739 + 0xa]
xul.dll!nsBaseAppShell::OnProcessNextEvent(nsIThreadInternal*, bool) [nsBaseAppShell.cpp:a4f4332a57e88004402b6a9a7f401c0fa8e078e9 : 259 + 0x42]
xul.dll!nsThread::ProcessNextEvent(bool, bool*) [nsThread.cpp:a4f4332a57e88004402b6a9a7f401c0fa8e078e9 : 1091 + 0x52]
So in both cases, it looks like this happens whilst we're waiting for events.
Reporter | ||
Comment 1•3 years ago
|
||
This turns out to be very specific. If I remove just the exists
change from the patch in bug 1733540, then there's no crashes (for this test at least): https://treeherder.mozilla.org/jobs?repo=try&revision=34305df8b0cd74e500a3de027364518fcc5784a2
Reporter | ||
Comment 2•3 years ago
|
||
Hi Barret, I'm wondering if you might be able to take a look at this crash?
I believe that making this line use IOUtils
is enough to trigger the crash.
I haven't been able to reproduce this locally yet though.
One suggestion from one of the bugs was that this could potentially be due to some of the async shutdown blocking.
Comment 3•3 years ago
|
||
It could be due to async shutdown blocking, but the async shutdown barrier code warns after 10 seconds if there is a blocker that hasn't yet resolved, which I don't see in the log.
We could add some logging to IOUtils around its async shutdown to see if we get stuck inside.
Comment 5•3 years ago
|
||
One of the jobs failed with a crash but it does not seem to be a shutdown crash at all, so I've re-triggered the jobs to run 5x.
Reporter | ||
Comment 6•3 years ago
|
||
The guess about the fact it was shutdown related may have been wrong. The log seems similar to some of the others that we saw previously - note that Mac & Windows didn't seem to have the crash stack information whereas Linux did.
Thinking about it given that the line in XPIInstall.jsm is triggered on startup, this is probably more likely to be startup, though I don't get why it would be intermittent and only show up on Talos.
Comment 7•3 years ago
|
||
I've fired off another try push with very verbose logging around all dispatching and resolving/rejecting of IOUtils operations https://treeherder.mozilla.org/#/jobs?repo=try&revision=6a662849e723ad7224e9fcfdaf062d8c57b43ccf
Comment 8•3 years ago
|
||
I took a peek at the generated logs and it seems like every call to IOUtils.exists() is resolving or rejecting appropriately -- we aren't getting stuck in the event queue, so something even weirder is going on.
It would be good if we could get a reproduction of this via rr to run on pernosco and see what is going on.
Comment 9•3 years ago
|
||
The severity field is not set for this bug.
:barret, could you have a look please?
For more information, please visit auto_nag documentation.
Updated•3 years ago
|
Updated•3 years ago
|
Reporter | ||
Comment 10•3 years ago
|
||
I believe the main issue in the Talos test has been fixed by bug 1744794.
I've just filed bug 1745430 on potentially adding a test for shutdown during startup.
Hence calling this fixed by bug 1744794.
Updated•3 years ago
|
Updated•3 years ago
|
Updated•2 years ago
|
Description
•