Closed
Bug 1265615
Opened 9 years ago
Closed 8 years ago
crash in SkPath::isRectContour
Categories
(Firefox Build System :: General, defect)
Tracking
(firefox48 affected)
RESOLVED
WONTFIX
Tracking | Status | |
---|---|---|
firefox48 | --- | affected |
People
(Reporter: n.nethercote, Unassigned)
References
Details
(Keywords: crash, Whiteboard: [gfx-noted])
Crash Data
This bug was filed from the Socorro interface and is
report bp-90dd96d0-d626-4a4e-8c9b-98fb92160418.
=============================================================
There have been 15 crashes with this signature in the past 7 days. I think it first showed up in Firefox 48.
It's a crash on an illegal instruction. I opened the minidump in Visual Studio and it claims the bad instruction is "movss xmm0,dword ptr [ebp-20h]".
Updated•9 years ago
|
Whiteboard: [gfx-noted]
Comment 2•9 years ago
|
||
Could this have been caused by the compiler update? Are we still using /arch:IA32 everywhere? Does /arch:IA32 still work in VS2105?
Comment 3•9 years ago
|
||
It looks like VS2015 is potentially generating SSE movss instructions even though we are compiling SkPath.cpp with /arch:IA32. What to do?
Flags: needinfo?(lsalzman) → needinfo?(gps)
Comment 4•9 years ago
|
||
http://archive.mozilla.org/pub/firefox/tinderbox-builds/mozilla-central-win32-pgo/1461065403/mozilla-central-win32-pgo-bm70-build1-build4.txt.gz confirms we're still passing /arch:IA32 to the compiler.
We may want to pour over the release notes and see if something changed in VS2015 w.r.t. SSE instructions.
Depends on: vs2015
Flags: needinfo?(gps)
Comment 5•9 years ago
|
||
Oh, we are seeing things like:
06:01:00 INFO - c:/builds/moz2_slave/m-cen-w32-pgo-0000000000000000/build/src/obj-firefox/_virtualenv/Scripts/python.exe -m mozbuild.action.cl c:/builds/moz2_slave/m-cen-w32-pgo-0000000000000000/build/src/vs2015u1/VC/bin/amd64_x86/cl.EXE -FoSkPathOpsDebug.obj -c -Ic:/builds/moz2_slave/m-cen-w32-pgo-0000000000000000/build/src/obj-firefox/dist/stl_wrappers -DNDEBUG=1 -DTRIMMED=1 -DUNICODE -D_UNICODE -DSKIA_IMPLEMENTATION=1 -Ic:/builds/moz2_slave/m-cen-w32-pgo-0000000000000000/build/src/gfx/skia -Ic:/builds/moz2_slave/m-cen-w32-pgo-0000000000000000/build/src/obj-firefox/gfx/skia -Ic:/builds/moz2_slave/m-cen-w32-pgo-0000000000000000/build/src/gfx/skia/skia/include/c -Ic:/builds/moz2_slave/m-cen-w32-pgo-0000000000000000/build/src/gfx/skia/skia/include/config -Ic:/builds/moz2_slave/m-cen-w32-pgo-0000000000000000/build/src/gfx/skia/skia/include/core -Ic:/builds/moz2_slave/m-cen-w32-pgo-0000000000000000/build/src/gfx/skia/skia/include/effects -Ic:/builds/moz2_slave/m-cen-w32-pgo-0000000000000000/build/src/gfx/skia/skia/include/gpu -Ic:/builds/moz2_slave/m-cen-w32-pgo-0000000000000000/build/src/gfx/skia/skia/include/images -Ic:/builds/moz2_slave/m-cen-w32-pgo-0000000000000000/build/src/gfx/skia/skia/include/pathops -Ic:/builds/moz2_slave/m-cen-w32-pgo-0000000000000000/build/src/gfx/skia/skia/include/ports -Ic:/builds/moz2_slave/m-cen-w32-pgo-0000000000000000/build/src/gfx/skia/skia/include/private -Ic:/builds/moz2_slave/m-cen-w32-pgo-0000000000000000/build/src/gfx/skia/skia/include/utils -Ic:/builds/moz2_slave/m-cen-w32-pgo-0000000000000000/build/src/gfx/skia/skia/include/utils/mac -Ic:/builds/moz2_slave/m-cen-w32-pgo-0000000000000000/build/src/gfx/skia/skia/include/utils/win -Ic:/builds/moz2_slave/m-cen-w32-pgo-0000000000000000/build/src/gfx/skia/skia/include/views -Ic:/builds/moz2_slave/m-cen-w32-pgo-0000000000000000/build/src/gfx/skia/skia/src/core -Ic:/builds/moz2_slave/m-cen-w32-pgo-0000000000000000/build/src/gfx/skia/skia/src/gpu -Ic:/builds/moz2_slave/m-cen-w32-pgo-0000000000000000/build/src/gfx/skia/skia/src/gpu/effects -Ic:/builds/moz2_slave/m-cen-w32-pgo-0000000000000000/build/src/gfx/skia/skia/src/gpu/gl -Ic:/builds/moz2_slave/m-cen-w32-pgo-0000000000000000/build/src/gfx/skia/skia/src/image -Ic:/builds/moz2_slave/m-cen-w32-pgo-0000000000000000/build/src/gfx/skia/skia/src/lazy -Ic:/builds/moz2_slave/m-cen-w32-pgo-0000000000000000/build/src/gfx/skia/skia/src/opts -Ic:/builds/moz2_slave/m-cen-w32-pgo-0000000000000000/build/src/gfx/skia/skia/src/sfnt -Ic:/builds/moz2_slave/m-cen-w32-pgo-0000000000000000/build/src/gfx/skia/skia/src/utils -Ic:/builds/moz2_slave/m-cen-w32-pgo-0000000000000000/build/src/gfx/skia/skia/src/utils/mac -Ic:/builds/moz2_slave/m-cen-w32-pgo-0000000000000000/build/src/gfx/skia/skia/src/utils/win -Ic:/builds/moz2_slave/m-cen-w32-pgo-0000000000000000/build/src/obj-firefox/dist/include -Ic:/builds/moz2_slave/m-cen-w32-pgo-0000000000000000/build/src/obj-firefox/dist/include/nspr -Ic:/builds/moz2_slave/m-cen-w32-pgo-0000000000000000/build/src/obj-firefox/dist/include/nss -MD -FI c:/builds/moz2_slave/m-cen-w32-pgo-0000000000000000/build/src/obj-firefox/mozilla-config.h -DMOZILLA_CLIENT -TP -nologo -wd5026 -wd5027 -Zc:sizedDealloc- -Zc:threadSafeInit- -wd4091 -D_HAS_EXCEPTIONS=0 -W3 -Gy -arch:IA32 -FS -wd4251 -wd4244 -wd4267 -wd4345 -wd4351 -wd4800 -wd4819 -wd4595 -we4553 -GR- -Zi -GL -O1 -Oi -Oy- -Fdgenerated.pdb c:/builds/moz2_slave/m-cen-w32-pgo-0000000000000000/build/src/gfx/skia/skia/src/pathops/SkPathOpsDebug.cpp
06:01:01 INFO - cl : Command line warning D9025 : overriding '/arch:IA32' with '/arch:SSE2'
Not sure exactly why we're getting that D9025 warning. My guess is one of the other compiler flags implies /arch:SSE2.
Comment 6•9 years ago
|
||
On my machine, SkPath.cpp gets merged into Unified_cpp_gfx_skia5.cpp. From the linked build log:
06:03:17 INFO - c:/builds/moz2_slave/m-cen-w32-pgo-0000000000000000/build/src/obj-firefox/_virtualenv/Scripts/python.exe -m mozbuild.action.cl c:/builds/moz2_slave/m-cen-w32-pgo-0000000000000000/build/src/vs2015u1/VC/bin/amd64_x86/cl.EXE -FoUnified_cpp_gfx_skia5.obj -c -Ic:/builds/moz2_slave/m-cen-w32-pgo-0000000000000000/build/src/obj-firefox/dist/stl_wrappers -DNDEBUG=1 -DTRIMMED=1 -DUNICODE -D_UNICODE -DSKIA_IMPLEMENTATION=1 -Ic:/builds/moz2_slave/m-cen-w32-pgo-0000000000000000/build/src/gfx/skia -Ic:/builds/moz2_slave/m-cen-w32-pgo-0000000000000000/build/src/obj-firefox/gfx/skia -Ic:/builds/moz2_slave/m-cen-w32-pgo-0000000000000000/build/src/gfx/skia/skia/include/c -Ic:/builds/moz2_slave/m-cen-w32-pgo-0000000000000000/build/src/gfx/skia/skia/include/config -Ic:/builds/moz2_slave/m-cen-w32-pgo-0000000000000000/build/src/gfx/skia/skia/include/core -Ic:/builds/moz2_slave/m-cen-w32-pgo-0000000000000000/build/src/gfx/skia/skia/include/effects -Ic:/builds/moz2_slave/m-cen-w32-pgo-0000000000000000/build/src/gfx/skia/skia/include/gpu -Ic:/builds/moz2_slave/m-cen-w32-pgo-0000000000000000/build/src/gfx/skia/skia/include/images -Ic:/builds/moz2_slave/m-cen-w32-pgo-0000000000000000/build/src/gfx/skia/skia/include/pathops -Ic:/builds/moz2_slave/m-cen-w32-pgo-0000000000000000/build/src/gfx/skia/skia/include/ports -Ic:/builds/moz2_slave/m-cen-w32-pgo-0000000000000000/build/src/gfx/skia/skia/include/private -Ic:/builds/moz2_slave/m-cen-w32-pgo-0000000000000000/build/src/gfx/skia/skia/include/utils -Ic:/builds/moz2_slave/m-cen-w32-pgo-0000000000000000/build/src/gfx/skia/skia/include/utils/mac -Ic:/builds/moz2_slave/m-cen-w32-pgo-0000000000000000/build/src/gfx/skia/skia/include/utils/win -Ic:/builds/moz2_slave/m-cen-w32-pgo-0000000000000000/build/src/gfx/skia/skia/include/views -Ic:/builds/moz2_slave/m-cen-w32-pgo-0000000000000000/build/src/gfx/skia/skia/src/core -Ic:/builds/moz2_slave/m-cen-w32-pgo-0000000000000000/build/src/gfx/skia/skia/src/gpu -Ic:/builds/moz2_slave/m-cen-w32-pgo-0000000000000000/build/src/gfx/skia/skia/src/gpu/effects -Ic:/builds/moz2_slave/m-cen-w32-pgo-0000000000000000/build/src/gfx/skia/skia/src/gpu/gl -Ic:/builds/moz2_slave/m-cen-w32-pgo-0000000000000000/build/src/gfx/skia/skia/src/image -Ic:/builds/moz2_slave/m-cen-w32-pgo-0000000000000000/build/src/gfx/skia/skia/src/lazy -Ic:/builds/moz2_slave/m-cen-w32-pgo-0000000000000000/build/src/gfx/skia/skia/src/opts -Ic:/builds/moz2_slave/m-cen-w32-pgo-0000000000000000/build/src/gfx/skia/skia/src/sfnt -Ic:/builds/moz2_slave/m-cen-w32-pgo-0000000000000000/build/src/gfx/skia/skia/src/utils -Ic:/builds/moz2_slave/m-cen-w32-pgo-0000000000000000/build/src/gfx/skia/skia/src/utils/mac -Ic:/builds/moz2_slave/m-cen-w32-pgo-0000000000000000/build/src/gfx/skia/skia/src/utils/win -Ic:/builds/moz2_slave/m-cen-w32-pgo-0000000000000000/build/src/obj-firefox/dist/include -Ic:/builds/moz2_slave/m-cen-w32-pgo-0000000000000000/build/src/obj-firefox/dist/include/nspr -Ic:/builds/moz2_slave/m-cen-w32-pgo-0000000000000000/build/src/obj-firefox/dist/include/nss -MD -FI c:/builds/moz2_slave/m-cen-w32-pgo-0000000000000000/build/src/obj-firefox/mozilla-config.h -DMOZILLA_CLIENT -TP -nologo -wd5026 -wd5027 -Zc:sizedDealloc- -Zc:threadSafeInit- -wd4091 -D_HAS_EXCEPTIONS=0 -W3 -Gy -arch:IA32 -FS -wd4251 -wd4244 -wd4267 -wd4345 -wd4351 -wd4800 -wd4819 -wd4595 -we4553 -GR- -Zi -GL -O1 -Oi -Oy- -Fdgenerated.pdb c:/builds/moz2_slave/m-cen-w32-pgo-0000000000000000/build/src/obj-firefox/gfx/skia/Unified_cpp_gfx_skia5.cpp
That has /arch:IA32 and I don't see a D9025 warning for it.
Comment 7•9 years ago
|
||
Unfortunately, build system output is interleaved from multiple processes. While I don't think we're getting that D9025 warning for Unified_cpp_gfx_skia5.cpp, I'm not 100% certain. Only way to know for sure is to build locally.
Unfortunately I don't have access to a Windows machine right this moment. But anyone should be able to test this...
Comment 8•9 years ago
|
||
It's a straight undocumented case: during PGO, what is actually compiling is the linker. And it turns out the linker does take -arch:ia32 too, which it expects passed both during profile gen and use passes. To make matters funnier, link.exe does reject -arch:ia32 when *not* doing PGO.
Updated•9 years ago
|
Assignee: nobody → mh+mozilla
Updated•9 years ago
|
Component: Graphics → Build Config
Comment 9•9 years ago
|
||
Erf no, in fact, it just takes time to complain about the option being unknown...
Comment 10•9 years ago
|
||
It does seem like a MSVC bug, where it happily compiles parts that are -arch:ia32 as if they had -arch:sse2. Interestingly, I can't reproduce with a small testcase, and it doesn't seem to be happening in smaller binaries of ours, but it's hard to tell because the CRT comes with its own functions that use SSE (but I'm assuming they are only used when SSE is supported, considering most of them have a function name starting with Xmm), so "simple" grep on disassembled sources find those...
That's about as far I can take it for today.
Assignee: mh+mozilla → nobody
Comment 11•9 years ago
|
||
Something that I didn't try but that would be worth trying, is whether this happens when building with LTCG but not PGO, which would tell us if it's a LTCG problem or a profile-induced problem.
Comment 12•9 years ago
|
||
Also, I should mention that one reason I wrote comment 8 is that the first thing I tried was to rebuild xul.dll per the profile-use pass, adding -arch:ia32, and the linker barfed that the flag was not passed during the profile-gen pass.
Comment 13•9 years ago
|
||
Note we could try solving this for the time being by surrounding the code where this occurs with a #pragma to disable optimizations for that function. It shouldn't have a critical effect on performance and may solve the issue while we figure out what's going on.
Comment 14•9 years ago
|
||
(In reply to Bas Schouten (:bas.schouten) from comment #13)
> Note we could try solving this for the time being by surrounding the code
> where this occurs with a #pragma to disable optimizations for that function.
> It shouldn't have a critical effect on performance and may solve the issue
> while we figure out what's going on.
How do you ensure it's the only place where this is happening? Are you suggesting we chase the crashes until there is none left?
Comment 15•9 years ago
|
||
(In reply to Mike Hommey [:glandium] from comment #14)
> (In reply to Bas Schouten (:bas.schouten) from comment #13)
> > Note we could try solving this for the time being by surrounding the code
> > where this occurs with a #pragma to disable optimizations for that function.
> > It shouldn't have a critical effect on performance and may solve the issue
> > while we figure out what's going on.
>
> How do you ensure it's the only place where this is happening? Are you
> suggesting we chase the crashes until there is none left?
I'm suggesting there's no reason to believe this is a common occurrence. If a crash due to this reaches a significant volume, I think preventing that crash is a better idea than waiting until we understand exactly what's going wrong. It's obviously not a long-term sustainable solution, and I never suggested it was :-).
Comment 16•9 years ago
|
||
Should we see if vs2015u2 helps here?
Updated•9 years ago
|
Comment 17•9 years ago
|
||
It wouldn't be the first time we've worked around a compiler bug. It should be feasible to get a sense of the scope of the problem by writing a little script to grep the full disassembly of xul.dll, something like (incomplete Python):
```
import subprocess
function = None
sse2_funcs = []
for line in subprocess.Popen(['dumpbin.exe', '-disasm', path_to_xul_dll], stdout=subprocess.PIPE).stdout:
if line.endswith(':\r\n'):
function = line[:-3]
print function
elif function and line_contains_sse2_instruction(line):
sse2_funcs.append(function)
print '\n'.join(sse2_funcs)
```
Comment 18•9 years ago
|
||
How feasible is it to write an automation check that verifies certain instructions don't accidentally sneak into our code? Could we maintain a whitelist of functions that are allowed to have e.g. SSE instructions?
Comment 19•9 years ago
|
||
(In reply to Ryan VanderMeulen [:RyanVM] from comment #16)
> Should we see if vs2015u2 helps here?
I did my tests with a fresh install, so I was using update 2.
(In reply to Ted Mielczarek [:ted.mielczarek] from comment #17)
> It wouldn't be the first time we've worked around a compiler bug. It should
> be feasible to get a sense of the scope of the problem by writing a little
> script to grep the full disassembly of xul.dll, something like (incomplete
> Python):
The problem is... there are a lot of legitimate uses of SSE instructions in xul.dll, both from our code and from the CRT. In those cases, the SSE instructions are behind a CPU check. But they still are in the binary. So you will always find SSE instructions in xul.dll. And as I wrote, I found some in smaller binaries too (like firefox.exe), from the CRT (but then that has nothing to do with LTCG).
Comment 20•9 years ago
|
||
(In reply to Gregory Szorc [:gps] from comment #18)
> How feasible is it to write an automation check that verifies certain
> instructions don't accidentally sneak into our code? Could we maintain a
> whitelist of functions that are allowed to have e.g. SSE instructions?
Depends on how accurate the symbol information is for xul.dll after PGO gets done with it. It's also possible that PGO might suddenly decide that an SSE-but-check-first function on the whitelist should be inlined into some other function, and then somebody would have to go verify that yes, it's OK that *this* function now contains SSE instructions...but then that doesn't prevent the compiler from using SSE instructions in the newly-whitelisted function. And so forth.
Actually, putting things that way, it sounds pretty tedious to do with high probability of things going wrong. This scenario is why we have things like compiler switches. :)
Comment 21•9 years ago
|
||
Did someone report this to Microsoft?
Comment 22•9 years ago
|
||
(In reply to Mike Hommey [:glandium] from comment #21)
> Did someone report this to Microsoft?
If they have, I would have expected it to be reported in the bug. I think you should report it.
Comment 23•9 years ago
|
||
(In reply to Ted Mielczarek [:ted.mielczarek] (Vacation May 5th-11th) from comment #17)
> It wouldn't be the first time we've worked around a compiler bug. It should
> be feasible to get a sense of the scope of the problem by writing a little
> script to grep the full disassembly of xul.dll, something like (incomplete
> Python):
> ```
> import subprocess
> function = None
> sse2_funcs = []
> for line in subprocess.Popen(['dumpbin.exe', '-disasm', path_to_xul_dll],
> stdout=subprocess.PIPE).stdout:
> if line.endswith(':\r\n'):
> function = line[:-3]
> print function
> elif function and line_contains_sse2_instruction(line):
> sse2_funcs.append(function)
> print '\n'.join(sse2_funcs)
> ```
So, some interesting things to note here:
- xul.dll contains legitimate SSE, so that wouldn't work.
- while it previously didn't happen on non-PGO builds, after I confirmed it happened on LTCG-non-PGO builds, I now can reproduce on non-LTCG builds... weird, but at least it makes things easier to reproduce
- I've run dumpbin.exe on all .objs on a non-PGO build, and got a large list of .obj files, including prtime.obj.
- Since that one looked like it would be a nicer candidate for a bug report (preprocessed, it's 10 times smaller than SkPath.cpp preprocessed), I compared the obj with the assembly (from -FA), and the assembly didn't contain SSE instructions (!)
- It turns out that (what I suspect is) offset tables for switch statements are not discriminated by the the disassembler, and thanks to x86 machine code being variable length with a very large set of opcode, some of the data in those tables disassembles as SSE instructions.
- That is not, however, what happens in SkPath.obj, which does have real SSE instructions, that are visible with the .asm output from MSVC.
- So on non-PGO builds, to detect if SSE instructions are used, you need to look at the .asm output from MSVC, which means adding -FA to the build flags.
- Obviously, that doesn't help with PGO builds... although we could cross fingers and hope that nothing more than what is detected in the .asm files on non-PGO is being SSE'd.
FWIW, the list of files that I found affected by cross checking .obj and .asm (eliminating those that contain SSE or AVX in their name ; note skia was de-unified on this build):
./gfx/cairo/cairo/src/cairo-d2d-surface.obj
./gfx/graphite2/src/Unified_cpp_gfx_graphite2_src0.obj
./gfx/graphite2/src/Unified_cpp_gfx_graphite2_src1.obj
./gfx/skia/GrAAHairLinePathRenderer.obj
./gfx/skia/GrPathUtils.obj
./gfx/skia/GrPLSPathRenderer.obj
./gfx/skia/SkDistanceFieldGen.obj
./gfx/skia/SkParsePath.obj
./gfx/skia/SkPath.obj
./gfx/skia/SkScan_Hairline.obj
./js/src/Unified_cpp_js_src1.obj
./js/src/Unified_cpp_js_src11.obj
./js/src/Unified_cpp_js_src16.obj
./js/src/Unified_cpp_js_src17.obj
./js/src/Unified_cpp_js_src23.obj
./media/libvpx/rtcd.obj
./media/libyuv/libyuv_libyuv/Unified_cpp_media_libyuv0.obj
./media/libyuv/libyuv_libyuv/Unified_cpp_media_libyuv1.obj
./media/webrtc/trunk/webrtc/common_audio/common_audio_common_audio/Unified_cpp_webrtc_common_audio1.obj
Comment 24•9 years ago
|
||
(In reply to Mike Hommey [:glandium] from comment #23)
> ./media/libyuv/libyuv_libyuv/Unified_cpp_media_libyuv0.obj
> ./media/libyuv/libyuv_libyuv/Unified_cpp_media_libyuv1.obj
Those come from compare_win.cc, rotate.cc, row_win.cc and scale_win.cc all containing direct SSE assembly, and it's fine. False alarm for them.
> ./gfx/graphite2/src/Unified_cpp_gfx_graphite2_src0.obj
> ./gfx/graphite2/src/Unified_cpp_gfx_graphite2_src1.obj
These two come from Collider.cpp and Slot.cpp.
> ./media/webrtc/trunk/webrtc/common_audio/common_audio_common_audio/Unified_cpp_webrtc_common_audio1.obj
This one comes from window_generator.cc, and it's small enough that I can derive a very small test case out of it. Yay \o/ (like, as small as 16 lines of C++ + a simple command line)
Comment 25•9 years ago
|
||
Filed on MS Connect:
https://connect.microsoft.com/VisualStudio/feedback/details/2672709
Comment 26•8 years ago
|
||
Since we'll be requiring support for SSE2, does that make this bug WONTFIX?
Comment 27•8 years ago
|
||
(In reply to Gregory Szorc [:gps] from comment #26)
> Since we'll be requiring support for SSE2, does that make this bug WONTFIX?
I agree.
Status: NEW → RESOLVED
Closed: 8 years ago
Resolution: --- → WONTFIX
Updated•7 years ago
|
Product: Core → Firefox Build System
You need to log in
before you can comment on or make changes to this bug.
Description
•