Closed
Bug 1276278
Opened 8 years ago
Closed 6 years ago
Interpreter.cpp takes 165+s to compile on Windows
Categories
(Core :: JavaScript Engine, defect)
Core
JavaScript Engine
Tracking
()
RESOLVED
WONTFIX
People
(Reporter: gps, Unassigned)
References
()
Details
Attachments
(2 files)
Unified_cpp_js_src31.cpp takes over 3 minutes to compile on Windows with VS2015u2 (although possibly not limited to that VS version). This cl.exe process lingers for 2+ minutes after all other Unified_cpp_js_src*.cpp cl.exe processes have finished. In other words, this is slowing down the build by 2+ minutes.
I /think/ this is a recent regression (last week or two).
On my machine, the unified source file is including:
* vm/GeneratorObject.cpp
* vm/GlobalObject.cpp
* vm/HelperThreads.cpp
* vm/Id.cpp
* vm/Interpreter.cpp
* vm/JSONParser.cpp
I sampled cl.exe at 1000Hz and the attached screenshot from Windows Performance Analyzer shows where the time is spent in cl.exe. Not sure if that's helpful.
Reporter | ||
Comment 1•8 years ago
|
||
The problem is Interpreter.cpp: it takes ~165s to compile on its own. This is on an i7-6700K, which has one of the fastest CPU cores you can buy at the moment.
Comment 2•8 years ago
|
||
(In reply to Gregory Szorc [:gps] from comment #1)
> The problem is Interpreter.cpp: it takes ~165s to compile on its own. This
> is on an i7-6700K, which has one of the fastest CPU cores you can buy at the
> moment.
This outcome is basically expected; Interpreter.cpp has one large function (one of the biggest, if not the biggest, in Gecko) that has very peculiar control flow, which probably drives up the time spent in the optimizer.
Comment 3•8 years ago
|
||
VS 2015 update 3 will have a new optimizer, it will be interesting to see if that improves performance. The profile shows most time is spent there.
Maybe we can inline less aggressively in Interpret.
A regression window would be very useful...
Reporter | ||
Comment 4•8 years ago
|
||
It still takes ~165s to compile on a48fafcdd36f, which is from ~1 month ago. So I was wrong about this being a recent regression. I guess I just started noticing.
Reporter | ||
Comment 5•8 years ago
|
||
Perhaps we could submit this source file to Microsoft as a "stress test" for their compiler in hopes they can optimize it in future toolchain versions? My sampling trace clearly indicates some hot spots. Although I'm not sure if there's anything they can do given the nature of the problem (I'm no compiler guru).
Reporter | ||
Updated•8 years ago
|
Summary: Unified_cpp_js_src31.cpp takes over 3 minutes to compile on Windows → Interpreter.cpp takes 165+s to compile on Windows
Comment 6•8 years ago
|
||
(In reply to Gregory Szorc [:gps] from comment #5)
> Perhaps we could submit this source file to Microsoft as a "stress test" for
> their compiler in hopes they can optimize it in future toolchain versions?
> My sampling trace clearly indicates some hot spots. Although I'm not sure if
> there's anything they can do given the nature of the problem (I'm no
> compiler guru).
They might be interested, yes. It's possible one of their optimization passes isn't doing the right thing.
Comment 7•8 years ago
|
||
FWIW, gcc 5.2 builds Unified_cpp_js_src31.o in ~17s on my machine, in par with other unified sources in the same directory, so this is really limited to MSVC.
Comment 8•8 years ago
|
||
Also FWIW, on an EC2 instance, building Unified_cpp_js_src31.obj with 2015 update 2 takes ~122s and building it with 14.0.24104-pre (a version that contains the new optimizer) takes ~121s (if I didn't mess up my attempt at using it), so about the same time.
Reporter | ||
Comment 9•8 years ago
|
||
(In reply to Mike Hommey [:glandium] from comment #8)
> Also FWIW, on an EC2 instance, building Unified_cpp_js_src31.obj with 2015
> update 2 takes ~122s
<sidetrack> Which build configuration is that? EC2 instances are running Intel Xeons from a few generations ago and don't have 4.0 GHz clock speeds: I'm surprised you report it taking shorter than my i7-6700K (Skylake at 4.0GHz). I'm building without a mozconfig and Unified_cpp_js_src31.obj takes ~153s (best of 3 runs, all of which were very consistent).
Here's my cl.exe invocation as reported by Process Explorer:
cl.EXE -FoUnified_cpp_js_src31.obj -c -DNDEBUG=1 -DTRIMMED=1 -D_CRT_RAND_S -DENABLE_BINARYDATA -DENABLE_SIMD -DENABLE_SHARED_ARRAY_BUFFER -DEXPORT_JS_API -DJS_HAS_CTYPES -DDLL_PREFIX=\"\" -DDLL_SUFFIX=\".dll\" -DFFI_BUILDING -Ic:/dev/src/firefox-unified/js/src -Ic:/dev/src/firefox-unified/obj-i686-pc-mingw32/js/src -Ic:/dev/src/firefox-unified/obj-i686-pc-mingw32/js/src/ctypes/libffi/include -Ic:/dev/src/firefox-unified/intl/icu/source/common -Ic:/dev/src/firefox-unified/intl/icu/source/i18n -Ic:/dev/src/firefox-unified/obj-i686-pc-mingw32/dist/include -Ic:/dev/src/firefox-unified/obj-i686-pc-mingw32/dist/include/nspr -MD -FI c:/dev/src/firefox-unified/obj-i686-pc-mingw32/js/src/js-confdefs.h -DMOZILLA_CLIENT -TP -nologo -wd4345 -wd4351 -wd4800 -wd4819 -wd4595 -D_CRT_SECURE_NO_WARNINGS -wd5026 -wd5027 -Zc:sizedDealloc- -Zc:threadSafeInit- -D_HAS_EXCEPTIONS=0 -W3 -Gy -arch:SSE2 -FS -wd4244 -wd4267 -wd4251 -we4553 -GR- -Zi -O2 -Oy -wd4805 -wd4661 -we4067 -we4258 -we4275 -wd4146 -wd4577 -wd4312 -Fdgenerated.pdb c:/dev/src/firefox-unified/obj-i686-pc-mingw32/js/src/Unified_cpp_js_src31.cpp -showIncludes
Comment 10•8 years ago
|
||
cl.exe -FoUnified_cpp_js_src31.obj -c -DNDEBUG=1 -DTRIMMED=1 -D_CRT_RAND_S -DENABLE_BINARYDATA -DENABLE_SIMD -DENABLE_SHARED_ARRAY_BUFFER -DEXPORT_JS_API -Ic:/Users/Administrator/mozilla-central/js/src -Ic:/Users/Administrator/mozilla-central/obj-i686-pc-mingw32/js/src -Ic:/Users/Administrator/mozilla-central/intl/icu/source/common -Ic:/Users/Administrator/mozilla-central/intl/icu/source/i18n -Ic:/Users/Administrator/mozilla-central/obj-i686-pc-mingw32/dist/include -Ic:/Users/Administrator/mozilla-central/obj-i686-pc-mingw32/dist/include/nspr -MD -FI c:/Users/Administrator/mozilla-central/obj-i686-pc-mingw32/js/src/js-confdefs.h -DMOZILLA_CLIENT -TP -nologo -wd4345 -wd4351 -wd4800 -wd4819 -wd4595 -D_CRT_SECURE_NO_WARNINGS -wd5026 -wd5027 -Zc:sizedDealloc- -Zc:threadSafeInit- -D_HAS_EXCEPTIONS=0 -W3 -Gy -arch:SSE2 -FS -wd4244 -wd4267 -wd4251 -we4553 -GR- -Zi -O2 -Oy -wd4805 -wd4661 -we4067 -we4258 -we4275 -wd4146 -wd4577 -wd4312 -Fdgenerated.pdb c:/Users/Administrator/mozilla-central/obj-i686-pc-mingw32/js/src/Unified_cpp_js_src31.cpp
Note that was a --enable-project=js --disable-shared-js build, but that shouldn't matter.
Windows reports this as CPU: Xeon E5-2666 v3 @ 2.90GHz.
Comment 11•8 years ago
|
||
(same timing in a Firefox build, unsurprisingly)
Reporter | ||
Comment 12•8 years ago
|
||
I built with your exact arguments modulo the paths and it didn't appear to change timings.
I tried setting the processor affinity for the process to a single core, but that didn't improve things.
I changed the processor priority to its highest setting and that didn't move the needle.
Task Manager is saying my CPU is clocked at 4.0+GHz for the duration of the process lifetime (then it backs off when idle).
I'm, uh, slightly disturbed by this disparity.
One thing I can think of is CPU cache: the Xeons should have more of it. If we're in some tight-ish loops, extra CPU cache could matter. FWIW, the E5-2666 is a custom SKU made for Amazon. IIRC it has better base and burst CPU frequency than retail Xeons. But that's still a Haswell versus a Skylake. I want to believe the cycle-for-cycle efficiency of the Skylake is greater.
Reporter | ||
Comment 13•8 years ago
|
||
Filed https://connect.microsoft.com/VisualStudio/feedback/details/2757523/compiling-firefox-spidermonkey-javascript-engines-interpreter-cpp-takes-2-minutes-10x-slower-than-gcc-clang upstream. Surely it can't hurt to upvote it :)
One workaround might be to change the build system to build SOURCES first, before UNIFIED_SOURCES, and then putting Interpreter.cpp in SOURCES. That way a core will end up building Interpreter.cpp while everything else is going on, hopefully having everything finish at roughly the same time. (The Interpreter.cpp build takes *5 minutes* on my Dell XPS 15 laptop; 2.75Ghz Core i7, VS 2015u2)
Comment 15•8 years ago
|
||
(In reply to Gregory Szorc [:gps] from comment #13)
> Filed
> https://connect.microsoft.com/VisualStudio/feedback/details/2757523/
> compiling-firefox-spidermonkey-javascript-engines-interpreter-cpp-takes-2-
> minutes-10x-slower-than-gcc-clang upstream.
They need more information from us. Did you provide it somehow or do we still have to do that?
Updated•8 years ago
|
Flags: needinfo?(gps)
Reporter | ||
Comment 16•8 years ago
|
||
Preprocessed output for Interpreter.cpp.
Reporter | ||
Comment 17•8 years ago
|
||
OK, hopefully they have enough details now. Not sure why my uploads never went through :(
Flags: needinfo?(gps)
Updated•8 years ago
|
Reporter | ||
Updated•8 years ago
|
Comment 18•6 years ago
|
||
should this be resolved FIXED and/or bug 1325686 be backed out now that clang is used on windows?
Flags: needinfo?(gps)
Reporter | ||
Comment 19•6 years ago
|
||
We still support building on MSVC. But Clang is the default and MSVC optimization is now a lower priority. So if the owners of this code want to resolve this bug as WONTFIX or WORKSFORME or whatnot, you will have no objections from me :)
Flags: needinfo?(gps)
Comment 20•6 years ago
|
||
Yep. MSVC is on its way out, so we don't care about its compile times. As for backing out bug 1325686 -- I don't know how long clang takes to compile this file, but just based on the idea that having all compilation units take roughly the same amount of time is good, and that compilation time is correlated with line count, I think I'd rather keep it separate.
Status: NEW → RESOLVED
Closed: 6 years ago
Resolution: --- → WONTFIX
You need to log in
before you can comment on or make changes to this bug.
Description
•