Open Bug 1642331 Opened 4 years ago Updated 4 years ago

win7 mingw build sometimes causes child process hangs on CreateFile call when creating breakpad ExceptionHandler

Categories

(Toolkit :: Crash Reporting, defect)

defect

Tracking

()

People

(Reporter: kats, Unassigned)

References

(Blocks 1 open bug)

Details

Backstory: when Botond tried landing bug 1611660, it caused a spike in win7 mingw reftest timeouts. See bug 1611660 comment 40 for a sample try push. I've been doing try pushes with additional logging to track this down, and the timeouts are caused by the content process simply hanging at various points. I've done many try pushes with lots of printfs to bisect exactly where the hangs are.

This bug is specifically about the content process hanging somewhere in this block of code. See e.g. the output in this try push where the last line logged comes from this printf and the next one a couple of lines down (after the ExceptionHandler creation) doesn't show up in the log.

I'll do more try pushes to narrow this down further, and file other bugs for other hang points as I identify them.

Not sure this goes in the build system, or crashreporter, or somewhere else.

There's a bunch of these hangs which all seem to only happen in win7 mingw builds, so I was assuming it has something to do with the build itself. The one in bug 1642315 for example I narrowed to a specific _wfopen call which is unrelated to the crashreporter.

Based on the output in this try push the hang gets narrowed down to the two blocks of code at https://searchfox.org/mozilla-central/rev/5e4d4827aa005d031580d2d17a01bae1af138b2e/toolkit/crashreporter/breakpad-client/windows/handler/exception_handler.cc#189-195,248-267 (the !IsOutOfProcess() conditional block doesn't get taken, so there's some code before and some after that's in between the relevant printfs).

I recently fixed a number of races in the Breakpad code; considering that code is managing the connection between child processes and the crash generation server I wouldn't be surprised if it's one more of those.

This try push further narrows this down to the client->Register() call.

It's hanging on the ConnectToPipe call. One more step until we're down to raw windows APIs.

Final step (log here) shows the hang is at this CreateFile call.

Summary: win7 mingw build sometimes causes child process hang when creating breakpad ExceptionHandler → win7 mingw build sometimes causes child process hangs on CreateFile call when creating breakpad ExceptionHandler
Component: General → Crash Reporting
Product: Firefox Build System → Toolkit

(In reply to Gabriele Svelto [:gsvelto] from comment #4)

I recently fixed a number of races in the Breakpad code; considering that code is managing the connection between child processes and the crash generation server I wouldn't be surprised if it's one more of those.

Yeah, I see no reason to believe any of this has anything to do with the build system (and "this series of data races only affects one particular architecture" is not a compelling argument to that end, especially when all of the buggy code seems to do inherently risky stuff involving threading). I've done my best to re-assign the bugs to appropriate owners.

The severity field is not set for this bug.
:gsvelto, could you have a look please?

For more information, please visit auto_nag documentation.

Flags: needinfo?(gsvelto)
Severity: -- → S4
Flags: needinfo?(gsvelto)
You need to log in before you can comment on or make changes to this bug.