win7 mingw build sometimes causes child process hangs on CreateFile call when creating breakpad ExceptionHandler
Categories
(Toolkit :: Crash Reporting, defect)
Tracking
()
People
(Reporter: kats, Unassigned)
References
(Blocks 1 open bug)
Details
Backstory: when Botond tried landing bug 1611660, it caused a spike in win7 mingw reftest timeouts. See bug 1611660 comment 40 for a sample try push. I've been doing try pushes with additional logging to track this down, and the timeouts are caused by the content process simply hanging at various points. I've done many try pushes with lots of printfs to bisect exactly where the hangs are.
This bug is specifically about the content process hanging somewhere in this block of code. See e.g. the output in this try push where the last line logged comes from this printf and the next one a couple of lines down (after the ExceptionHandler creation) doesn't show up in the log.
I'll do more try pushes to narrow this down further, and file other bugs for other hang points as I identify them.
Comment 1•4 years ago
|
||
Not sure this goes in the build system, or crashreporter, or somewhere else.
Reporter | ||
Comment 2•4 years ago
|
||
There's a bunch of these hangs which all seem to only happen in win7 mingw builds, so I was assuming it has something to do with the build itself. The one in bug 1642315 for example I narrowed to a specific _wfopen call which is unrelated to the crashreporter.
Reporter | ||
Updated•4 years ago
|
Reporter | ||
Comment 3•4 years ago
|
||
Based on the output in this try push the hang gets narrowed down to the two blocks of code at https://searchfox.org/mozilla-central/rev/5e4d4827aa005d031580d2d17a01bae1af138b2e/toolkit/crashreporter/breakpad-client/windows/handler/exception_handler.cc#189-195,248-267 (the !IsOutOfProcess()
conditional block doesn't get taken, so there's some code before and some after that's in between the relevant printfs).
Comment 4•4 years ago
|
||
I recently fixed a number of races in the Breakpad code; considering that code is managing the connection between child processes and the crash generation server I wouldn't be surprised if it's one more of those.
Reporter | ||
Comment 5•4 years ago
|
||
This try push further narrows this down to the client->Register()
call.
Reporter | ||
Comment 6•4 years ago
|
||
It's hanging on the ConnectToPipe
call. One more step until we're down to raw windows APIs.
Reporter | ||
Comment 7•4 years ago
|
||
Final step (log here) shows the hang is at this CreateFile
call.
Reporter | ||
Updated•4 years ago
|
Reporter | ||
Updated•4 years ago
|
Updated•4 years ago
|
Comment 8•4 years ago
|
||
(In reply to Gabriele Svelto [:gsvelto] from comment #4)
I recently fixed a number of races in the Breakpad code; considering that code is managing the connection between child processes and the crash generation server I wouldn't be surprised if it's one more of those.
Yeah, I see no reason to believe any of this has anything to do with the build system (and "this series of data races only affects one particular architecture" is not a compelling argument to that end, especially when all of the buggy code seems to do inherently risky stuff involving threading). I've done my best to re-assign the bugs to appropriate owners.
Comment hidden (Intermittent Failures Robot) |
Comment 10•4 years ago
|
||
The severity field is not set for this bug.
:gsvelto, could you have a look please?
For more information, please visit auto_nag documentation.
Updated•4 years ago
|
Description
•