Closed Bug 1639153 Opened 5 years ago Closed 3 years ago

Optimize indirect calls by not pessimizing same-instance calls

Categories

(Core :: JavaScript: WebAssembly, enhancement, P2)

enhancement

Tracking

()

RESOLVED FIXED
84 Branch

People

(Reporter: dbezhetskov, Assigned: dbezhetskov)

References

(Blocks 1 open bug)

Details

(Keywords: perf-alert)

Attachments

(20 files, 9 obsolete files)

(deleted), text/x-phabricator-request
Details
(deleted), text/x-phabricator-request
Details
(deleted), text/x-phabricator-request
Details
(deleted), text/x-phabricator-request
Details
(deleted), application/x-javascript
Details
(deleted), text/x-phabricator-request
Details
(deleted), text/x-phabricator-request
Details
(deleted), text/x-phabricator-request
Details
(deleted), text/x-phabricator-request
Details
(deleted), text/x-phabricator-request
Details
(deleted), text/x-phabricator-request
Details
(deleted), text/x-phabricator-request
Details
(deleted), text/x-phabricator-request
Details
(deleted), text/x-phabricator-request
Details
(deleted), text/x-phabricator-request
Details
(deleted), text/x-phabricator-request
Details
(deleted), text/x-phabricator-request
Details
(deleted), text/x-phabricator-request
Details
(deleted), text/x-phabricator-request
Details
(deleted), application/x-tar
Details
No description provided.
Assignee: nobody → dbezhetskov
Attachment #9150086 - Attachment description: Bug 1639153 - change call_indirect signature check to happen after frame is pushed → Bug 1639153 - change call_indirect signature check to happen after frame is pushed r=lth,wingo

I'm a bit confused about putting 2 extra words in each FunctionTableElem; I had been thinking the goal state is to get tables down to a single word. What's the benefit of performing the call_indirect signature check after pushing the frame? If the signature happens before pushing a frame, the trampoline can first check the signature, then push the frame, then call the tail entry (which would be after the signature check). Then there are only two entries per function.

Also nit: could we rename "tail entry" to "trampoline entry"? "Tail" keeps making me thing "tail call entry".

Hi, I understand your confusion.
Our final target is still to have two entries table and trampoline.

Final goal:
call_indirect -> call table_entry -> Trampoline stub -> target function

Trampoline stub:
check signature
check for instance equality
switch to callee instance
call the trampoline_entry
switch to caller instance

However, it's hard to achieve this in one commit and if I succeed to create such big commit (I tried to do this) it takes so much time because I don't know what could go wrong, so Andy and I decided to split this work into series of extremely short patches. It's maintained in the Steps to get there section of the abi-2020 doc (https://docs.google.com/document/d/1oi6ROZJuE-hBb21Tgq-XcUegPktecF52gADzBJcYL34).

Long story short:
I will push inlined trampoline stub right in call_indirect and then adapt stack maps and frame iteration.
Next, I'm going to finally remove Frame::tls.
Then we move inlined trampoline into the stub. Here we don't need to care about runtime stuff because it's already done in the previous steps.

Because I'm going to inline trampoline stub on caller side we need to move signature check after the frame is pushed. Because obviously, we can't do the check on the caller side and this is why I moved it.

WDYT?

Flags: needinfo?(luke)

An additional question about the final design and the current signature check optimization:
AFAIU a thunk (aka trampoline aka decorator) is a fixed size stub with some connected data and it will be generated independently from the function.

So, we can't bake cmp with immediate into it, as we do know.
We can only load funcTypeId from memory and then do a check, right?

To add on -- we will eventually need to be able to perform signature checks with a frame already on the stack, in the tail-call case. So it's more consistent if the frame is always present when signature-check traps are raised.

To expand also on the number of function entry points -- right now only the function is in a position to check its signature. So we need a "call entry" (target of a call instruction) and a "tail entry" (target of a jump), and for each we need a checked and unchecked variant. Before Dmitry's work there are two entries (checked and unchecked call). Dmitry's current patch adds checked and unchecked tail entries.

In the future it would be possible to put the signature in the table associating code pointer, JSFunction*, and TlsData, and arrange to only check signatures in the trampoline -- if that's what we want to do. (After all, anyone that calls direct already knows that the callee has the right type.) That would allow us to reduce to only a call and a tail entry. Not sure if a good optimization or not, though.

WDYT Luke?

Apologies for the battery of comments -- a tail entry is indeed that: an entry for a tail call. Whether it's a tail call resulting from a WebAssembly tail-call proposal (something we're not working on) or the result of a jump from a trampoline is a detail -- the salient point is that the fixed part of the frame is already on the stack.

Haha another small apology -- Luke you may be wondering, why implement tail entries when we are not working on tail calls?

The answer is in the design doc:

As an optimization, when control enters an indirect entry trampoline, the trampoline can check if the incoming WasmTlsReg is actually same-instance, and in that case can jump to the type-checking tail entry. In that case there will be no trampoline frame and the callee will return directly to the caller.

So in that case, the trampoline can tail-call the callee, by jumping to the callee's tail entry. But because the trampoline can't currently check the callee's signature, we target the checked tail entry.

AFAIU the unchecked tail entry would only ever be invoked by a direct tail call from the tail calls proposal, so I guess that bit isn't strictly necessary.

Thanks for all the responses and info; understanding that the goal here is to break up a larger patch into smaller independent patches makes total sense. Would it make sense to hold off on landing the patches until the final state is better than the current state? I have a light concern that we get stuck for a while at an in-between state.

Also, "tail call entry" now makes sense; I like the symmetry between trampolines and tail calls :)

The 4 entry points also make sense. Instead of recording dynamic offsets for all of them, would it be possible (now or later) to have static offsets, such that from a pointer to any one entry point you can compute the others? The reason I ask is thinking ahead toward the function-references proposal when you have one function pointer (a funcref or (ref (func ...))) that can be called with a fixed or dynamic signature with a normal or tail call, so it seems like we'd want to have function references point to one canonical entry point and have all the other entry points be static offsets made at the callsite.

(In reply to Dmitry Bezhetskov from comment #4)

An additional question about the final design and the current signature check optimization:
AFAIU a thunk (aka trampoline aka decorator) is a fixed size stub with some connected data and it will be generated independently from the function.

So, we can't bake cmp with immediate into it, as we do know.
We can only load funcTypeId from memory and then do a check, right?

Given a wasm function signature (and no other context), you can statically classify it as having either an immediate representation (where the signature is literally baked into the immediate, which means the immediate is invariant over executions and thus can be (de)serialized without patched) or requires an indirect representation. Thus, if the signature is classified as an immediate, then the trampoline can indeed bake in the cmp-with-immediate. (And if the signature is classified as indirect, then the trampoline can load it from the callee TlsData.)

Flags: needinfo?(luke)

Good morning! Good questions, Luke :) Point by point:

  1. About patch landing order. Completely agreed that we need to avoid shipping regressions. We operate a bit on instinct here as to what is a regression and what isn't; my impression was that in the planned change series at https://docs.google.com/document/d/1oi6ROZJuE-hBb21Tgq-XcUegPktecF52gADzBJcYL34/edit#heading=h.22ki0jodkf2h, no change would regress in a significant way.

I understand your concern to also be about leaving the tree in an awkward state for a long time, but I think Dmitry is on a roll at this point and when the inline trampoline code lands -- which is the next patch or so -- then we should start to see net performance improvements.

If that were all, I would be inclined to agree with you to hold off until a series can land, but on our side there has been the risk of making patches too big to review, or patches that don't do the right thing, or patches that simply need some style fixes. I think the pressure of getting something good enough to incrementally land has been a real positive thing for progress on this feature and so weighing everything, I would continue to land small steps toward the goal -- as long as we are all generally on board about the goal and the general plan for getting there.

  1. Static offsets would be great! Right now I think that might not be possible because of whether the function has an immediate signature check or not. But yes, I think you are right we can get there.

  2. Immediate signature checks: I think we were under the impression that in the final design, trampoline code would not be signature-dependent -- and so if you were to consider moving signature checks to trampolines, probably you would make all checks load the expected signature from the storage associated with the trampoline instead of having immediate comparisons. I was also unsure if immediate vs load-from-memory was important here -- the branch would surely predict perfectly, and it would "just" be another couple µops in the pipeline per indirect call. Dunno. WDYT here? As I said we have to operate a bit on instincts and ours could be miscalibrated.

Thanks for the explanations!

  1. Makes sense, sgtm
  2. I was thinking that differences of that nature could be covered up by nop'ing up to the desired static offset. But agreed that we don't need to do this now.
  3. Agreed that immediate vs load-from-memory is probably not a huge deal, but just in the interest of not giving it up unnecessarily: how would the trampoline be signature independent? My operating assumption has been that tramplines are statically specialized to the callee (so that they can use a direct call, and also bake in the callee's instance pointer as an immediate), and thus they could know the callee's expected signature as an immediate

Getting back to whether the signature check happens before or after pushing the frame: it makes sense that, for tail calls, the callee frame will have already been pushed, so this should be allowed. However, I think doing so for all call entries will cause extra prologue code to be generated for the two checked entries per function (since they must push their own frame and then jump over the frame-pushing of the unchecked entry). If the signature can be checked before pushing the frame, then the checked entries can check and then fall-through to the frame-pushing of the unchecked entries. IIRC from my measurements, b/c of the large number of small functions in Unity-type apps, prologue-generated machine code is a significant fraction of overall code size. Also icache locality would be somewhat improved. But I forget: are there any concrete problems with checking the signature before pushing the frame?

Are there any concrete problems with checking the signature before pushing the frame?

There are two nits and one serious problem.

  1. Nit: With tail calls, you have to deal with the signature check trap being raised either with or without the frame.
  2. Nit: If overhead is a problem and tail calls will come, then we should find a solution that allows tail calls without overhead.
  3. Problem: If a function is called from a trampoline, it needs to not set FP and instead use the FP set by the trampoline.

I think this last issue is the motivator to have checked call entries and checked tail entries. WDYT @luke ?

Luke, what is the plan for import calls? They use Frame::tls too, I suppose we are going to handle call_import in the same manner as call_indirect? It means that we are going to call thunk to switch instances.
WDYT?

Flags: needinfo?(luke)

Yeah, good question. I wasn't thinking about import calls b/c they are usually just simplified versions of indirect calls in that you statically know you're calling out of the instance. Thus, if it was easy, it would make sense to statically (at the callsite) do something instead of using a trampoline. That being said, I don't see an easy way to do that in this case so yeah, probably best just to call a trampoline just like indirect calls.

Flags: needinfo?(luke)

There are still a few remaining uses of masm.loadWasmTlsRegFromFrame():

  1. CodeGenerator::visitWasmCall
    here we reload Tls because Ion assumes that WasmTlsReg is non-volatile.
    After implementing trampoline and uses them for call_import and call_indirect we can safely remove this.
  2. MacroAssembler::callWithABI
    afaik we set tls here only for wasm BuiltinThunks. A BuiltinThunk uses WasmTlsReg only in the prologue/epilogue, so when we delete Frame::tls this will become obsolete and we can remove it then.
Pushed by malexandru@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/578160fb0ac3 change call_indirect signature check to happen after frame is pushed r=lth,wingo

Backed out changeset 578160fb0ac3 (bug 1639153) for WasmFrameIter.cpp related bustage

Push with failure: https://treeherder.mozilla.org/#/jobs?repo=autoland&searchStr=build&fromchange=578160fb0ac35c82a12139b9de86ecf2d791928a&tochange=dac0f3345b5984106adcf22d20e2bceb4a8fb4e6&selectedTaskRun=NWgXbXLXTBa5zNMyhbNQyA-0

Backout link: https://hg.mozilla.org/integration/autoland/rev/dac0f3345b5984106adcf22d20e2bceb4a8fb4e6

Failure log: https://treeherder.mozilla.org/logviewer.html#/jobs?job_id=305426059&repo=autoland&lineNumber=5524

task 2020-06-08T06:29:22.459Z] make[3]: Leaving directory '/builds/worker/workspace/build/src/obj-spider/js/src/irregexp'
[task 2020-06-08T06:29:22.459Z] /builds/worker/fetches/gcc/bin/g++ -std=gnu++17 -o Unified_cpp_js_src_frontend0.o -c  -I/builds/worker/workspace/build/src/obj-spider/dist/system_wrappers -include /builds/worker/workspace/build/src/config/gcc_hidden.h -U_FORTIFY_SOURCE -D_FORTIFY_SOURCE=2 -fstack-protector-strong -DNDEBUG=1 -DTRIMMED=1 -DJS_CACHEIR_SPEW -DJS_STRUCTURED_SPEW -DEXPORT_JS_API -DMOZ_HAS_MOZGLUE -I/builds/worker/workspace/build/src/js/src/frontend -I/builds/worker/workspace/build/src/obj-spider/js/src/frontend -I/builds/worker/workspace/build/src/obj-spider/js/src -I/builds/worker/workspace/build/src/js/src -I/builds/worker/workspace/build/src/obj-spider/dist/include -I/builds/worker/workspace/build/src/obj-spider/dist/include/nspr -fPIC -DMOZILLA_CLIENT -include /builds/worker/workspace/build/src/obj-spider/js/src/js-confdefs.h -Wall -Wempty-body -Wignored-qualifiers -Woverloaded-virtual -Wpointer-arith -Wsign-compare -Wtype-limits -Wunreachable-code -Wwrite-strings -Wno-invalid-offsetof -Wduplicated-cond -Wimplicit-fallthrough -Wunused-function -Wunused-variable -Wno-error=maybe-uninitialized -Wno-error=deprecated-declarations -Wno-error=array-bounds -Wno-error=coverage-mismatch -Wno-error=free-nonheap-object -Wformat -Wformat-overflow=2 -Wno-noexcept-type -D_GLIBCXX_USE_CXX11_ABI=0 -fno-sized-deallocation -fno-aligned-new -fno-rtti -ffunction-sections -fdata-sections -fno-exceptions -fno-math-errno -pthread -pipe -g -freorder-blocks -O3 -fno-omit-frame-pointer -funwind-tables -Werror -fno-strict-aliasing -Werror=format -Wno-shadow -Wno-attributes  -MD -MP -MF .deps/Unified_cpp_js_src_frontend0.o.pp   Unified_cpp_js_src_frontend0.cpp
[task 2020-06-08T06:29:22.459Z] js/src/frontend/Unified_cpp_js_src_frontend1.o
[task 2020-06-08T06:29:23.263Z] /builds/worker/fetches/gcc/bin/g++ -std=gnu++17 -o Unified_cpp_js_src_jit10.o -c  -I/builds/worker/workspace/build/src/obj-spider/dist/system_wrappers -include /builds/worker/workspace/build/src/config/gcc_hidden.h -U_FORTIFY_SOURCE -D_FORTIFY_SOURCE=2 -fstack-protector-strong -DNDEBUG=1 -DTRIMMED=1 -DJS_CACHEIR_SPEW -DJS_STRUCTURED_SPEW -DEXPORT_JS_API -DMOZ_HAS_MOZGLUE -I/builds/worker/workspace/build/src/js/src/jit -I/builds/worker/workspace/build/src/obj-spider/js/src/jit -I/builds/worker/workspace/build/src/obj-spider/js/src -I/builds/worker/workspace/build/src/js/src -I/builds/worker/workspace/build/src/obj-spider/dist/include -I/builds/worker/workspace/build/src/obj-spider/dist/include/nspr -fPIC -DMOZILLA_CLIENT -include /builds/worker/workspace/build/src/obj-spider/js/src/js-confdefs.h -Wall -Wempty-body -Wignored-qualifiers -Woverloaded-virtual -Wpointer-arith -Wsign-compare -Wtype-limits -Wunreachable-code -Wwrite-strings -Wno-invalid-offsetof -Wduplicated-cond -Wimplicit-fallthrough -Wunused-function -Wunused-variable -Wno-error=maybe-uninitialized -Wno-error=deprecated-declarations -Wno-error=array-bounds -Wno-error=coverage-mismatch -Wno-error=free-nonheap-object -Wformat -Wformat-overflow=2 -Wno-noexcept-type -D_GLIBCXX_USE_CXX11_ABI=0 -fno-sized-deallocation -fno-aligned-new -fno-rtti -ffunction-sections -fdata-sections -fno-exceptions -fno-math-errno -pthread -pipe -g -freorder-blocks -O3 -fno-omit-frame-pointer -funwind-tables -Werror -fno-strict-aliasing -Werror=format -Wno-shadow -Wno-attributes  -MD -MP -MF .deps/Unified_cpp_js_src_jit10.o.pp   Unified_cpp_js_src_jit10.cpp
[task 2020-06-08T06:29:23.264Z] js/src/jit/Unified_cpp_js_src_jit11.o
[task 2020-06-08T06:29:23.986Z] /builds/worker/fetches/gcc/bin/g++ -std=gnu++17 -o Unified_cpp_js_src_wasm1.o -c  -I/builds/worker/workspace/build/src/obj-spider/dist/system_wrappers -include /builds/worker/workspace/build/src/config/gcc_hidden.h -U_FORTIFY_SOURCE -D_FORTIFY_SOURCE=2 -fstack-protector-strong -DNDEBUG=1 -DTRIMMED=1 -DJS_CACHEIR_SPEW -DJS_STRUCTURED_SPEW -DEXPORT_JS_API -DMOZ_HAS_MOZGLUE -I/builds/worker/workspace/build/src/js/src/wasm -I/builds/worker/workspace/build/src/obj-spider/js/src/wasm -I/builds/worker/workspace/build/src/obj-spider/js/src -I/builds/worker/workspace/build/src/js/src -I/builds/worker/workspace/build/src/obj-spider/dist/include -I/builds/worker/workspace/build/src/obj-spider/dist/include/nspr -fPIC -DMOZILLA_CLIENT -include /builds/worker/workspace/build/src/obj-spider/js/src/js-confdefs.h -Wall -Wempty-body -Wignored-qualifiers -Woverloaded-virtual -Wpointer-arith -Wsign-compare -Wtype-limits -Wunreachable-code -Wwrite-strings -Wno-invalid-offsetof -Wduplicated-cond -Wimplicit-fallthrough -Wunused-function -Wunused-variable -Wno-error=maybe-uninitialized -Wno-error=deprecated-declarations -Wno-error=array-bounds -Wno-error=coverage-mismatch -Wno-error=free-nonheap-object -Wformat -Wformat-overflow=2 -Wno-noexcept-type -D_GLIBCXX_USE_CXX11_ABI=0 -fno-sized-deallocation -fno-aligned-new -fno-rtti -ffunction-sections -fdata-sections -fno-exceptions -fno-math-errno -pthread -pipe -g -freorder-blocks -O3 -fno-omit-frame-pointer -funwind-tables -Werror -fno-strict-aliasing -Werror=format -Wno-shadow -Wno-attributes  -MD -MP -MF .deps/Unified_cpp_js_src_wasm1.o.pp   Unified_cpp_js_src_wasm1.cpp
[task 2020-06-08T06:29:23.986Z] js/src/wasm/Unified_cpp_js_src_wasm2.o
[task 2020-06-08T06:29:26.656Z] In file included from Unified_cpp_js_src_wasm1.cpp:11:0:
[task 2020-06-08T06:29:26.656Z] /builds/worker/workspace/build/src/js/src/wasm/WasmFrameIter.cpp: In function 'void js::wasm::GenerateFunctionPrologue(js::jit::MacroAssembler&, const js::wasm::FuncTypeIdDesc&, const mozilla::Maybe<unsigned int>&, js::wasm::FuncOffsets*)':
[task 2020-06-08T06:29:26.656Z] /builds/worker/workspace/build/src/js/src/wasm/WasmFrameIter.cpp:581:3: error: static assertion failed: code aligned
[task 2020-06-08T06:29:26.656Z]    static_assert(CheckedTailEntryOffset % CodeAlignment == 0, "code aligned");
[task 2020-06-08T06:29:26.656Z]    ^~~~~~~~~~~~~
[task 2020-06-08T06:29:28.430Z]    Compiling regex v1.3.3
[task 2020-06-08T06:29:28.430Z]      Running `CARGO=/builds/worker/fetches/rustc/bin/cargo CARGO_MANIFEST_DIR=/builds/worker/workspace/build/src/third_party/rust/regex CARGO_PKG_AUTHORS='The Rust Project Developers' CARGO_PKG_DESCRIPTION='An implementation of regular expressions for Rust. This implementation uses
[task 2020-06-08T06:29:28.430Z] finite automata and guarantees linear time matching on all inputs.
[task 2020-06-08T06:29:28.430Z] ' CARGO_PKG_HOMEPAGE='https://github.com/rust-lang/regex' CARGO_PKG_NAME=regex CARGO_PKG_REPOSITORY='https://github.com/rust-lang/regex' CARGO_PKG_VERSION=1.3.3 CARGO_PKG_VERSION_MAJOR=1 CARGO_PKG_VERSION_MINOR=3 CARGO_PKG_VERSION_PATCH=3 CARGO_PKG_VERSION_PRE= LD_LIBRARY_PATH='/builds/worker/workspace/build/src/obj-spider/release/deps:/builds/worker/fetches/rustc/lib:/builds/worker/workspace/build/src/obj-spider/dist/bin:/builds/worker/fetches/gcc/lib64' /builds/worker/fetches/rustc/bin/rustc --crate-name regex /builds/worker/workspace/build/src/third_party/rust/regex/src/lib.rs --error-format=json --json=diagnostic-rendered-ansi,artifacts --crate-type lib --emit=dep-info,metadata,link -C opt-level=2 --cfg 'feature="aho-corasick"' --cfg 'feature="default"' --cfg 'feature="memchr"' --cfg 'feature="perf"' --cfg 'feature="perf-cache"' --cfg 'feature="perf-dfa"' --cfg 'feature="perf-inline"' --cfg 'feature="perf-literal"' --cfg 'feature="std"' --cfg 'feature="thread_local"' --cfg 'feature="unicode"' --cfg 'feature="unicode-age"' --cfg 'feature="unicode-bool"' --cfg 'feature="unicode-case"' --cfg 'feature="unicode-gencat"' --cfg 'feature="unicode-perl"' --cfg 'feature="unicode-script"' --cfg 'feature="unicode-segment"' -C metadata=59cb2151c37e484b -C extra-filename=-59cb2151c37e484b --out-dir /builds/worker/workspace/build/src/obj-spider/release/deps -C linker=/builds/worker/workspace/build/src/build/cargo-linker -L dependency=/builds/worker/workspace/build/src/obj-spider/release/deps --extern aho_corasick=/builds/worker/workspace/build/src/obj-spider/release/deps/libaho_corasick-ae60b24f6684d433.rmeta --extern memchr=/builds/worker/workspace/build/src/obj-spider/release/deps/libmemchr-f1b26a4756e91620.rmeta --extern regex_syntax=/builds/worker/workspace/build/src/obj-spider/release/deps/libregex_syntax-a209697783b4651f.rmeta --extern thread_local=/builds/worker/workspace/build/src/obj-spider/release/deps/libthread_local-df70bf44d84bd469.rmeta --cap-lints warn`
[task 2020-06-08T06:29:30.442Z] /builds/worker/workspace/build/src/config/rules.mk:746: recipe for target 'Unified_cpp_js_src_wasm1.o' failed
[task 2020-06-08T06:29:30.442Z] make[3]: *** [Unified_cpp_js_src_wasm1.o] Error 1
[task 2020-06-08T06:29:30.443Z] make[3]: Leaving directory '/builds/worker/workspace/build/src/obj-spider/js/src/wasm'
[task 2020-06-08T06:29:30.443Z] /builds/worker/workspace/build/src/config/recurse.mk:74: recipe for target 'js/src/wasm/target-objects' failed
[task 2020-06-08T06:29:30.443Z] make[2]: *** [js/src/wasm/target-objects] Error 2
[task 2020-06-08T06:29:30.443Z] make[2]: *** Waiting for unfinished jobs....
[task 2020-06-08T06:29:30.443Z] /builds/worker/fetches/gcc/bin/g++ -std=gnu++17 -o Unified_cpp_js_src_frontend1.o -c  -I/builds/worker/workspace/build/src/obj-spider/dist/system_wrappers -include /builds/worker/workspace/build/src/config/gcc_hidden.h -U_FORTIFY_SOURCE -D_FORTIFY_SOURCE=2 -fstack-protector-strong -DNDEBUG=1 -DTRIMMED=1 -DJS_CACHEIR_SPEW -DJS_STRUCTURED_SPEW -DEXPORT_JS_API -DMOZ_HAS_MOZGLUE -I/builds/worker/workspace/build/src/js/src/frontend -I/builds/worker/workspace/build/src/obj-spider/js/src/frontend -I/builds/worker/workspace/build/src/obj-spider/js/src -I/builds/worker/workspace/build/src/js/src -I/builds/worker/workspace/build/src/obj-spider/dist/include -I/builds/worker/workspace/build/src/obj-spider/dist/include/nspr -fPIC -DMOZILLA_CLIENT -include /builds/worker/workspace/build/src/obj-spider/js/src/js-confdefs.h -Wall -Wempty-body -Wignored-qualifiers -Woverloaded-virtual -Wpointer-arith -Wsign-compare -Wtype-limits -Wunreachable-code -Wwrite-strings -Wno-invalid-offsetof -Wduplicated-cond -Wimplicit-fallthrough -Wunused-function -Wunused-variable -Wno-error=maybe-uninitialized -Wno-error=deprecated-declarations -Wno-error=array-bounds -Wno-error=coverage-mismatch -Wno-error=free-nonheap-object -Wformat -Wformat-overflow=2 -Wno-noexcept-type -D_GLIBCXX_USE_CXX11_ABI=0 -fno-sized-deallocation -fno-aligned-new -fno-rtti -ffunction-sections -fdata-sections -fno-exceptions -fno-math-errno -pthread -pipe -g -freorder-blocks -O3 -fno-omit-frame-pointer -funwind-tables -Werror -fno-strict-aliasing -Werror=format -Wno-shadow -Wno-attributes  -MD -MP -MF .deps/Unified_cpp_js_src_frontend1.o.pp   Unified_cpp_js_src_frontend1.cpp
[task 2020-06-08T06:29:30.443Z] js/src/frontend/Unified_cpp_js_src_frontend2.o
[task 2020-06-08T06:29:31.950Z] In file included from Unified_cpp_js_src_frontend0.cpp:29:0:
[task 2020-06-08T06:29:31.950Z] /builds/worker/workspace/build/src/js/src/frontend/BytecodeEmitter.cpp: In member function 'bool js::frontend::BytecodeEmitter::emitSetOrInitializeDestructuring(js::frontend::ParseNode*, js::frontend::DestructuringFlavor)':
[task 2020-06-08T06:29:31.950Z] /builds/worker/workspace/build/src/js/src/frontend/BytecodeEmitter.cpp:2653:48: warning: 'kind' may be used uninitialized in this function [-Wmaybe-uninitialized]
[task 2020-06-08T06:29:31.950Z]          NameOpEmitter noe(this, name, loc, kind);
[task 2020-06-08T06:29:31.950Z]                                                 ^
[task 2020-06-08T06:29:33.100Z] /builds/worker/workspace/build/src/js/src/frontend/Parser.cpp: In member function 'typename ParseHandler::ClassNodeType js::frontend::GeneralParser<ParseHandler, Unit>::classDefinition(js::frontend::YieldHandling, js::frontend::GeneralParser<ParseHandler, Unit>::ClassContext, js::frontend::DefaultHandling) [with ParseHandler = js::frontend::FullParseHandler; Unit = char16_t]':
[task 2020-06-08T06:29:33.100Z] /builds/worker/workspace/build/src/js/src/frontend/Parser.cpp:7356:16: warning: 'innerName' may be used uninitialized in this function [-Wmaybe-uninitialized]
[task 2020-06-08T06:29:33.100Z]    NameNodeType innerName;
...
Flags: needinfo?(dbezhetskov)
Pushed by btara@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/a4d1e2c3ba9e change call_indirect signature check to happen after frame is pushed r=lth,wingo
Backout by malexandru@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/9bd3381b2697 Backed out changeset a4d1e2c3ba9e for causing bustages in WasmFrameIter.cpp
Pushed by apavel@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/4cf40cf33b0a change call_indirect signature check to happen after frame is pushed r=lth,wingo
Pushed by rmaries@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/f1beae5af856 change call_indirect signature check to happen after frame is pushed r=lth,wingo

issues with signature moving are resolved and now the changes are landed.
The problem was in --disable-jit configuration, I messed up with MacroAssembler-none.h.

Flags: needinfo?(dbezhetskov)
Status: UNCONFIRMED → RESOLVED
Closed: 4 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla79
Keywords: leave-open
Status: RESOLVED → UNCONFIRMED
Resolution: FIXED → ---
Status: UNCONFIRMED → ASSIGNED
Ever confirmed: true
Target Milestone: mozilla79 → ---
Pushed by dluca@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/251dbf12af47 Remove using Frame::tls in Jit import exit stub r=lth,wingo
Pushed by abutkovits@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/f82a44c95497 Refactor wasm::Frame to reduce number of gnarly casts. r=lth,wingo
Pushed by ncsoregi@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/b620533c6d63 Refactor wasm::Frame to reduce number of gnarly casts. r=lth,wingo
Pushed by csabou@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/caa8552f4f52 Refactor wasm::Frame to reduce number of gnarly casts. r=lth,wingo

fixed and pushed updated changes

Flags: needinfo?(dbezhetskov)
Attachment #9157867 - Attachment description: Bug 1639153 - interpreter entry pushes trampoline frame to preserve tls for runtime r=lth → Bug 1639153 - Interpreter entry pushes trampoline frame to preserve tls for runtime r=lth

(In reply to Pulsebot from comment #33)

Pushed by ncsoregi@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/b620533c6d63
Refactor wasm::Frame to reduce number of gnarly casts. r=lth,wingo

Hi, it seems this change caused build failure on mips platform, is it ok that I submit the fix patch to this bug? Or should I file a new bug? Thanks!

Hu Zhao, I think it is OK to submit patches here if the number of patches is small.
BTW, I wonder that we support mips platform at all.

(In reply to Dmitry Bezhetskov from comment #40)

BTW, I wonder that we support mips platform at all.

We do have MIPS support but it's a tier-3 platform, supported not by Mozilla but by external contributors (most of them at Longsoon). There are MIPS simulators in the tree in the same way as arm/arm64, and sometimes we try to keep the MIPS code working by running them, but usually we just change things as we want and wait for the MIPS support team to catch up.

Pushed by ncsoregi@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/5078030eaeaf [MIPS] Fix a recent refactoring to wasm::Frame on mips. r=lth
Attachment #9157867 - Attachment is obsolete: true
Attached file call_indirect_ubench.js (obsolete) (deleted) —

Microbenchmark for call_indirect. On my Xeon workstation, I get these numbers with yesterday's mozilla-central (JS shell, configure with --disable-debug --enable-optimize --enable-release, run with --wasm-compiler=ion):

external: 388.2451171875
internal: 333.218017578125

This suggests that in the code that has landed so far, indirect calls to module-internal functions are about 14% faster than indirect calls to imported functions.

Attached file call_indirect_ubench.js (deleted) —

Updated to add direct calls as a comparison. These come in at 187ms roughly, ie, at about 56% of the module-internal indirect calls.

Attachment #9159321 - Attachment is obsolete: true

For my amd machine it is:

branches/default/tip Fri Jun 26:

external: 256.566162109375
internal: 254.14599609375
direct: 186.510986328125

branches/default/tip Fri Jun 26 + additional trampoline frame:

external: 420.4189453125
internal: 279.97607421875
direct: 186.052978515625

Direct calls aren't affected.
Internal: 9% slowdown. It's because we add additional "if" there, before we just load/reload tls from the frame.
External: 64% slowdown. Pushing additional trampoline frame + if + additional call/return into/from trampoline.

If we didn't push Frame::tls we would save:
2 push for internals, one for the function's frame and one for the trampoline
3 push for external

Luke, if we had the same approach with trampolines for import calls it would that they will slower than now. As you say there isn't an easy way to make a workaround for import calls because we need to somehow signalize for frame iteration that instance is changed.

Actually, I can say that there will be the same slowdown as for cross-instance case for call_indirect. It is about 50% slowdown compared to the previous version where we just WasmTlsReg <- callee's tls and call callee.code.
The current plan is good, but it has also such consequences, lemme show trade-offs for current plan:

  • very fast direct calls, actually no overhead at all
  • a little bit more code, about 4 bytes per function for nops
  • import calls and cross-instance call_indirect become slow. We add constant overhead for this, additional call/return + push trampoline's data
  • trampoline doesn't need to copy stack args because we address it from FP.

I don't have a whole picture of wasm in SpiderMonkey, but I think you have. So, these tradeoffs are ok for us or not? I'm asking because saw the web IDL binding proposal and it looks like we are going to use import calls to call WebIDL.

There is another tradeoff we can do. We can fuse your original idea with the current plan.
Let me describe this a little bit more:
Your original idea was to add slots after stack arguments and then track tls via CallSiteDesc.
I've investigated this earlier and I can say that it is doable if the number of slots is equal 2. (I have abandoned PR for this).
One for calleeTls and one other one for callerTls. We need to preserve it only on possible cross-instance calls.
The current plan with addressing from FP and trampolines allows us don't copy arguments when we call a trampoline.
So, with this fusion we have:

  • a little bit slowdown for all calls included direct calls because we always allocate two slots after stack args, even if there is no any.
  • import calls don't need to call a trampoline because runtime already knows about this transition.
  • the amount of memory by wasm function is increased by 2 slots for each call.
  • trampolines already have slots for callerTls and calleeTls so it needs only to prepare a frame for the callee and jump into it.
  • trampoline doesn't need to copy stack args because we address it from FP.

You may ask, what is actually slowdown for direct calls because they are very frequent in wasm demos.
I've added to slots for all calls and measured on the benchmark (provided by Lars Hansen). There we call the function with no stack arguments so it's the most frequent cause for zengarden and other benchmarks.

I've run this test many times and calculate the average. The test machine is my hp laptop with ubuntu 18.04 and inter core i7 on board.
Base revision is (5c9a4697057bd5ef8e15e6223e049674107ca433 Wed Jul 1 Bug 1147091 - Add StableSort to nsTArray_Impl. r=froydnj Simon Giesecke <sgiesecke@mozilla.com>). Flags: --disable-debug --enable-optimize --enable-release, run with --wasm-compiler=ion

with two additional slots:
external: 267.17
internal: 254.44
direct: 140.26

reference:
external: 269.44
internal: 260.42
direct: 145.84

It looks strange but this is what we have. I compared many times but it looks like there no overhead for all calls and even some improvements..
The amount of memory is increased, of course, it is about 2 slots for each call (depends on the alignment).

So, wdyt about these two cases?
First one is ok and constant overhead doesn't scare us
or we increase each call by two slots and have cheap import calls and cheap trampolines?

Flags: needinfo?(luke)
Flags: needinfo?(lth)
Flags: needinfo?(lth) → needinfo?(lhansen)

One possible disadvantage of the second approach is that for same-instance call_indirect we need to preserve tls because for runtime we can't distinguish between import call and call_indirect they both have kind == dynamic, but it's just two store instruction.

I think slowing down cross-instance indirect and import calls is not good, so anything we can do to mitigate that is worth thinking hard about.

I don't think "adding two extra slots" should cost anything in the no-stack-args case if we do things right. Ion creates the stack frame with enough space for all outgoing arguments on function entry, so there should be no reason to make any adjustment of the SP just prior to the call - the adjustment should be made on function entry when we create the local frame. I thought that there might be an added small cost in functions that make zero-stack-arg calls and that themselves do not currently need to allocate any stack memory, like int f(int x) { return g(x)+1 }, but currently we're allocating a frame (8 bytes unused memory) for those already to ensure proper stack alignment in the callee, so there would be no regression there either.

I'm a little more worried about code size, frankly, because programs are becoming large quickly. In the case of my function f above I see 14 bytes of NOPs on x64 currently:

00000002  55                        push %rbp
00000003  48 8b ec                  mov %rsp, %rbp
00000006  0f 1f 00                  nopl %eax, (%rax)
00000009  0f 1f 80 00 00 00 00      nopl %eax, (%rax)
00000010  41 83 fa 23               cmp $0x23, %r10d
00000014  0f 84 0c 00 00 00         jz 0x0000000000000026
0000001A  0f 0b                     ud2
0000001C  0f 1f 40 00               nopl %eax, (%rax)
00000020  41 56                     push %r14
00000022  55                        push %rbp
00000023  48 8b ec                  mov %rsp, %rbp
00000026  48 83 ec 08               sub $0x08, %rsp
...
Flags: needinfo?(lhansen)

Just to check: when you say that all calls would have 2 extra slots (for callerTLS/calleeTLS), are those slots always allocated, but only initialized in the cross-instance cases by the trampolines? If so, then I agree it should be practically free so that seems like a good idea.

Flags: needinfo?(luke)

(In reply to Luke Wagner [:luke] from comment #51)

Just to check: when you say that all calls would have 2 extra slots (for callerTLS/calleeTLS), are those slots always allocated, but only initialized in the cross-instance cases by the trampolines? If so, then I agree it should be practically free so that seems like a good idea.

We need to fill these slots for indirect calls. Imports calls will always store tls and calls via trampolines will fill them according to the trampoline logic.

We are going to remove Frame::tls and support trampolines for indirect calls, so we need to get rid of using Frame::tls.
In this and the followup patches I will iteratively remove all dependencies of Frame::tls and remove it eventually.

In this patch I changed wasm ABI to allocate two stack slots after stack args to preserve caller's and callee's tls'es in the near future.

This is a followup patch for removing Frame::tls.
Now, we are preserving caller's and callee's tls'es for all possible cross-instance calls in the previously allocated abi slots.
We will use them in the near future to untie c++ runtime from Frame::tls and to reload WasmTlsReg after a cross-instance call.

I'm afraid I'm about to go on vacation for a week, and probably at this point in time, Lars needs to be the better reviewer. I think he's also on vacation for I think the next two weeks, so I'm sorry about the delay.

Luke, thanks for the notice.
If you could review patches after the vacation it would be awesome. If you can't it is well, ok, /sigh/ for me to wait for Lars. In the last case I'll change you on Lars after hi returns from vacation.

BTW, patches are very simple and extremely short.

This is the third part of series of patches to Frame without tls pointer.
Here we preserve initial tls in all entry stubs and then use it to find a proper tls instance for a given frame.

To find the TlsData* for specific frame we start from a entry stub's tls
and then track tls through all possible cross-instance calls. This logic
is implemented in GetNearestEffectiveTls procedure.

Then, we use this new procedure to make singal handling free from Frame::tls.

Here we replace usage of Frame::tls in frame iteration with GetNearestEffectiveTls.
We also maintain current tls for frame iteration object to not to call GetNearestEffectiveTls everytime.

Here we remove remaining uses of Frame::tls. There are many places where
we use it: in debug frames, in profiling frames, in jit activation, etc.
All these places require short fixes to use our new scheme for getting
tls, so I gathered them together.

Let's see following code and let's assume that wasm-compiler is ion:

call foo
call bar

Ion can clobber tls inside foo and then just go with clobbered tls into bar.
It will crash if bar uses tls. At compile-time we don't know whether bar will use tls or not.

It works well when we restore tls each time when we return from a function because of the current frame structure.
But now, when we want to get rid of that Frame::tls we should guarantee that Ion doesn't clobber tls inside a function.
In fact we forces Ion to restore tls iff it clobbers it and it is a rare case.

Baseline doesn't need such logic because of its private stack slot for tls.

We remove Frame::tls and fix prologue/epilogue. Runtime, Ion and Baseline are ready for this move already.

When I did measurement I suddenly discovered that there is some small performance regression at patch 7.
My machine is not representative at all, but it is just a little bit mysterious.
At 6 part performance of direct calls: 185
At 7 part it is: 193

I haven't seen any major performance regression at patch 7.

I started investigation, compared the codes and discovered that SM debug (--enable-debug --disable-optimize) generates code which executes faster than SM release (--disable-debug --enable-optimize --enable-release).
At 7 part:
Debug: 183
Release: 195

Debug code:

   0x2802daadd2b0      push   %rbp
   0x2802daadd2b1      mov    %rsp,%rbp
   0x2802daadd2b4      sub    $0x20,%rsp
   0x2802daadd2b8      cmp    %rsp,0x30(%r14)
   0x2802daadd2bc      jb     0x2802daadd2c4
   0x2802daadd2c2      ud2
   0x2802daadd2c4      mov    %edi,0x1c(%rsp)
   0x2802daadd2c8      xor    %eax,%eax
   0x2802daadd2ca      mov    %eax,0x18(%rsp)
   0x2802daadd2ce      xchg   %ax,%ax
   0x2802daadd2d0      cmpl   $0x0,0x38(%r14)
   0x2802daadd2d5      je     0x2802daadd2dd
   0x2802daadd2db      ud2
   0x2802daadd2dd      mov    0x1c(%rsp),%eax
   0x2802daadd2e1      test   %eax,%eax
   0x2802daadd2e3      je     0x2802daadd311
   0x2802daadd2e9      sub    $0x1,%eax
   0x2802daadd2ec      mov    %eax,0x1c(%rsp)
   0x2802daadd2f0      mov    $0x7,%edi
   0x2802daadd2f5      test   $0xf,%spl
   0x2802daadd2f9      je     0x2802daadd300
   0x2802daadd2ff      int3
   0x2802daadd300      callq  0x2802daadd0a0
   0x2802daadd305      mov    %eax,%eax
   0x2802daadd307      add    0x18(%rsp),%eax
   0x2802daadd30b      mov    %eax,0x18(%rsp)
   0x2802daadd30f      jmp    0x2802daadd2ce
   0x2802daadd311      mov    0x18(%rsp),%eax
   0x2802daadd315      add    $0x20,%rsp
   0x2802daadd319      pop    %rbp
   0x2802daadd31a      retq 

Release code:

    0x1256f09c4290      push   %rbp
    0x1256f09c4291      mov    %rsp,%rbp
    0x1256f09c4294      sub    $0x20,%rsp
    0x1256f09c4298      cmp    %rsp,0x30(%r14)
    0x1256f09c429c      jb     0x1256f09c42a4 
    0x1256f09c42a2      ud2
    0x1256f09c42a4      mov    %edi,0x1c(%rsp)
    0x1256f09c42a8      xor    %eax,%eax
    0x1256f09c42aa      mov    %eax,0x18(%rsp)
    0x1256f09c42ae      xchg   %ax,%ax
    0x1256f09c42b0      cmpl   $0x0,0x38(%r14)
    0x1256f09c42b5      je     0x1256f09c42bd 
    0x1256f09c42bb      ud2
    0x1256f09c42bd      mov    0x1c(%rsp),%eax
    0x1256f09c42c1      test   %eax,%eax
    0x1256f09c42c3      je     0x1256f09c42e6 
    0x1256f09c42c9      sub    $0x1,%eax
    0x1256f09c42cc      mov    %eax,0x1c(%rsp)
    0x1256f09c42d0      mov    $0x7,%edi
    0x1256f09c42d5      callq  0x1256f09c40a0 
    0x1256f09c42da      mov    %eax,%eax
    0x1256f09c42dc      add    0x18(%rsp),%eax
    0x1256f09c42e0      mov    %eax,0x18(%rsp)
    0x1256f09c42e4      jmp    0x1256f09c42ae 
    0x1256f09c42e6      mov    0x18(%rsp),%eax
    0x1256f09c42ea      add    $0x20,%rsp
    0x1256f09c42ee      pop    %rbp           
    0x1256f09c42ef      retq

The only difference which I can see is:

test   $0xf,%spl
je     0x2802daadd300
int3

Looks like inserting this dynamic check is speeding up the code.
Looks like an alignment problem.

I'll check the SM behavior on my laptop and compare results.

If I increase the loop body:

  (func (export "run_direct") (param $count i32) (result i32)
    (local $sum i32)
    (block $END
      (loop $LOOP
        (br_if $END (i32.eqz (local.get $count)))
        (local.set $count (i32.sub (local.get $count) (i32.const 1)))
        (local.set $sum
          (i32.add (call $internal (i32.const 7))
                                   (local.get $sum)))
        (local.set $sum
          (i32.add (call $internal (i32.const 7))
                                   (local.get $sum)))
        (br $LOOP)))
    (local.get $sum))

then difference between the reference and part7 PR is very small 254.25 vs 255 so I think the previous anomaly affects only very small loops.

I also compared reference and my stack of patches on https://github.com/lars-t-hansen/embenchen in release mode with --wasm-compiler=ion
The results (lower score is better):

The first column is score for the reference and the second column is for the my stack of patches.
Looks like everything is good!

box2d:              700  691
bullet:             895  903
conditionals:       213  214
copy:               463  465
corrections:        1356 1356
fannkuch:           2334 2257
fasta:              616  616
fib:                639  612
ifs:                197  190
binarytrees:        3235 3234
memops:             979  976
primes:             1451 1454
raybench:           1430 1444
rust-fannkuch:      2189 2192
skinning:           541  539
zlib:               987  988

I also compared external and internal calls with the previous patched benchmark:

Reference:

external: 257.699951171875
internal: 254.968017578125
direct: 258.489990234375

My stack of patches:

external: 258.5400390625
internal: 234.716064453125
direct: 256.7490234375

To sum up, I can conclude that direct call aren't affected by modulo the first test, internal calls became even faster, and external calls aren't affected, so removing Frame::tls is a good independent step, imho.

Attachment #9162440 - Attachment description: Bug 1639153 - Part 1: Reserve two slots after stack arguments for the future tls preservation. r=luke → Bug 1639153 - Part 1: Reserve two slots after stack arguments for the future tls preservation. r=lth
Attachment #9162453 - Attachment description: Bug 1639153 - Part 2: Preserve callee and caller tls'es. r=luke → Bug 1639153 - Part 2: Preserve callee and caller tls'es. r=lth
Pushed by csabou@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/f68854b70d6a Part 1: Reserve two slots after stack arguments for the future tls preservation. r=lth https://hg.mozilla.org/integration/autoland/rev/296b7f5a0713 Part 2: Preserve callee and caller tls'es. r=lth

Backed out for bustage on MacroAssembler.h

backout: https://hg.mozilla.org/integration/autoland/rev/eeed6471b7255c0d93d96ff2a202567256ac5a30

push: https://treeherder.mozilla.org/#/jobs?repo=autoland&selectedTaskRun=Hqmhidc8RPeFuoWEjJPfPA.0&revision=296b7f5a07133f9cbbd28ab20c1468800b0e1d39

failure log: https://treeherder.mozilla.org/logviewer.html#/jobs?job_id=312226941&repo=autoland&lineNumber=21791

[task 2020-08-06T06:29:23.793Z] 06:29:23 INFO - In file included from /builds/worker/checkouts/gecko/js/src/jit/x86/SharedICRegisters-x86.h:10:
[task 2020-08-06T06:29:23.793Z] 06:29:23 INFO - /builds/worker/checkouts/gecko/js/src/jit/MacroAssembler.h(4202,5): error: use of undeclared identifier 'increaseStackOffset'
[task 2020-08-06T06:29:23.793Z] 06:29:23 INFO - increaseStackOffset(wasm::FrameWithTls::sizeWithoutFrame());
[task 2020-08-06T06:29:23.793Z] 06:29:23 INFO - ^
[task 2020-08-06T06:29:23.793Z] 06:29:23 INFO - 1 error generated.
[task 2020-08-06T06:29:23.793Z] 06:29:23 ERROR - make[4]: *** [/builds/worker/checkouts/gecko/config/rules.mk:748: Parser.obj] Error 1
[task 2020-08-06T06:29:23.793Z] 06:29:23 INFO - make[4]: Leaving directory '/builds/worker/workspace/obj-build/js/src/frontend'
[task 2020-08-06T06:29:23.793Z] 06:29:23 ERROR - make[3]: *** [/builds/worker/checkouts/gecko/config/recurse.mk:72: js/src/frontend/target-objects] Error 2
[task 2020-08-06T06:29:23.793Z] 06:29:23 INFO - make[3]: *** Waiting for unfinished jobs....

Flags: needinfo?(dbezhetskov)
Attachment #9162715 - Attachment description: Bug 1639153 - Part 3: Implement the algorithm for obtaining tls and use it for wasm signal handling. r=luke → Bug 1639153 - Part 3: Implement the algorithm for obtaining tls and use it for wasm signal handling. r=lth
Pushed by csabou@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/95d6c42f07fd Part 1: Reserve two slots after stack arguments for the future tls preservation. r=lth https://hg.mozilla.org/integration/autoland/rev/25931ec43a32 Part 2: Preserve callee and caller tls'es. r=lth https://hg.mozilla.org/integration/autoland/rev/30af0e8c7956 Part 3: Implement the algorithm for obtaining tls and use it for wasm signal handling. r=lth
Attachment #9162723 - Attachment description: Bug 1639153 - Part 4: Untie frame iteration from Frame::tls. r=luke → Bug 1639153 - Part 4: Untie frame iteration from Frame::tls. r=lth
Attachment #9162736 - Attachment description: Bug 1639153 - Part 5: Remove remaining uses of Frame::tls. r=luke → Bug 1639153 - Part 5: Remove remaining uses of Frame::tls. r=lth
Attachment #9162746 - Attachment description: Bug 1639153 - Part 6: Force Ion to preserve its tls. r=luke → Bug 1639153 - Part 6: Force Ion to preserve its tls. r=lth
Regressions: 1649928
Regressions: 1657815

Unverified but plausible regression: bug 1657741.

Regressions: 1657825
Regressions: 1657917

We have a tls dependency in callWithABI for wasm because we call builtins via BuiltinThunk.
Last one requires that tls to be loaded. In this patch I gradually extend callWithABI interface to
pass offset to tls. This allows us to preserve tls and load it by offset in callWithABI and frees us from
the Frame::tls dependency.

x86 has few register so to do div/mod for i64 it call the runtime
and clobber almost all gp registers including WasmTlsReg.
To be able to call c++ runtime via Builtin thunk we need to set up WasmTlsReg.
In this patch I create dependencies from MIR level to Codegen to be sure that WasmTlsReg is alive
when we call runtime.

To be able to call c++ runtime via Builtin thunk we need to set up WasmTlsReg.
In this patch I create dependencies from MIR level to Codegen to be sure that WasmTlsReg is alive
when we call runtime in div/mod i64 for arm.

In this patch we add a tls dependency for the remaining nodes which use
BuiltinThunk to call c++ runtime. By ABI requirements WasmTlsReg should
be set.

We generate builtin call for Mod operation for Double types, so we need
to add a tls dependency. In this patch I've added it.

We generate builtin call for MTruncateToInt32 operation for floating points types,
so we need to add a tls dependency.
I inserted NYI for arm64 because Ion doesn't support arm64.

Pushed by malexandru@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/f1af63fb5986 Part 6: Force Ion to preserve its tls. r=lth
Pushed by ccoroiu@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/93f3b8aa5bc5 Part 6.1: Untie callWithAbi from Frame::tls for Baseline. r=lth
Pushed by abutkovits@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/eb35fdc50799 Part 6.2: Establish dependency from tls for x86 callWithABI div/mod i64. r=lth
Pushed by btara@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/fd8adea7aa17 Part 6.2: Establish dependency from tls for x86 callWithABI div/mod i64. r=lth https://hg.mozilla.org/integration/autoland/rev/9b42e0b690b6 Part 6.3: Establish dependency from tls for arm callWithABI div/mod i64. r=lth https://hg.mozilla.org/integration/autoland/rev/69a27bde5003 Part 6.4: Add tls dependency for WasmTruncateToInt64 and Int64ToFloatingPoint for arm. r=lth https://hg.mozilla.org/integration/autoland/rev/a309060018a8 Part 6.5: Add tls dependency for WasmModD. r=lth https://hg.mozilla.org/integration/autoland/rev/1e582b17eab7 Part 6.6: Add tls dependency for truncate i32. r=lth

Backed out 5 changesets (bug 1639153) for bustages complaining about Lowering.cpp.

Push with failures: https://treeherder.mozilla.org/#/jobs?repo=autoland&group_state=expanded&selectedTaskRun=dcOWhtqBQ_22acexbHEqgw.0&fromchange=6d91f5b4cc322020443d2789407a3fe0ad50091b&searchStr=build&tochange=3010f73c9eec545c42715659f3593516ef686f70

Backout link: https://hg.mozilla.org/integration/autoland/rev/3010f73c9eec545c42715659f3593516ef686f70

Failure log: https://treeherder.mozilla.org/logviewer.html#/jobs?job_id=315409723&repo=autoland&lineNumber=33796

[task 2020-09-11T05:59:11.889Z] 05:59:11     INFO -  make[4]: Entering directory '/builds/worker/workspace/obj-build/js/src/jit'
[task 2020-09-11T05:59:11.896Z] 05:59:11     INFO -  /builds/worker/fetches/sccache/sccache /builds/worker/fetches/clang/bin/clang++ -std=gnu++17 -o Unified_cpp_js_src_jit8.o -c  -I/builds/worker/workspace/obj-build/dist/system_wrappers -include /builds/worker/checkouts/gecko/config/gcc_hidden.h -U_FORTIFY_SOURCE -DNDEBUG=1 -DTRIMMED=1 -DWASM_SUPPORTS_HUGE_MEMORY -DJS_CACHEIR_SPEW -DJS_STRUCTURED_SPEW -DJS_HAS_CTYPES -DFFI_BUILDING -DEXPORT_JS_API -DMOZ_HAS_MOZGLUE -I/builds/worker/checkouts/gecko/js/src/jit -I/builds/worker/workspace/obj-build/js/src/jit -I/builds/worker/workspace/obj-build/js/src -I/builds/worker/checkouts/gecko/js/src -I/builds/worker/workspace/obj-build/dist/include -I/builds/worker/workspace/obj-build/dist/include/nspr -fPIC -DMOZILLA_CLIENT -include /builds/worker/workspace/obj-build/js/src/js-confdefs.h -Qunused-arguments -Qunused-arguments -Wall -Wbitfield-enum-conversion -Wempty-body -Wignored-qualifiers -Woverloaded-virtual -Wpointer-arith -Wshadow-field-in-constructor-modified -Wsign-compare -Wtype-limits -Wunreachable-code -Wunreachable-code-return -Wwrite-strings -Wno-invalid-offsetof -Wclass-varargs -Wempty-init-stmt -Wfloat-overflow-conversion -Wfloat-zero-conversion -Wloop-analysis -Wc++2a-compat -Wcomma -Wimplicit-fallthrough -Wunused-function -Wunused-variable -Werror=non-literal-null-conversion -Wstring-conversion -Wtautological-overlap-compare -Wtautological-unsigned-enum-zero-compare -Wtautological-unsigned-zero-compare -Wno-error=tautological-type-limit-compare -Wno-inline-new-delete -Wno-error=deprecated-declarations -Wno-error=array-bounds -Wno-error=backend-plugin -Wno-error=return-std-move -Wno-error=atomic-alignment -Wno-error=deprecated-copy -Wformat -Wformat-security -Wno-gnu-zero-variadic-macro-arguments -Werror=implicit-function-declaration -Wno-noexcept-type -Wno-psabi -Wno-unknown-warning-option -D_GLIBCXX_USE_CXX11_ABI=0 -fno-sized-deallocation -fno-aligned-new -fsanitize=bool,bounds,enum,integer-divide-by-zero,object-size,pointer-overflow,return,vla-bound -fno-sanitize-recover=bool,bounds,enum,integer-divide-by-zero,object-size,pointer-overflow,return,vla-bound -fsanitize-blacklist=/builds/worker/workspace/obj-build/js/src/ubsan_blacklist.txt -fsanitize=address -fcrash-diagnostics-dir=/builds/worker/artifacts -fno-rtti -ffunction-sections -fdata-sections -fno-exceptions -fno-math-errno -pthread -pipe -g -Xclang -load -Xclang /builds/worker/workspace/obj-build/build/clang-plugin/libclang-plugin.so -Xclang -add-plugin -Xclang moz-check -O2 -gline-tables-only -fno-omit-frame-pointer -funwind-tables -Werror -fno-strict-aliasing -Werror=format -Wno-shadow  -MD -MP -MF .deps/Unified_cpp_js_src_jit8.o.pp   Unified_cpp_js_src_jit8.cpp
[task 2020-09-11T05:59:11.896Z] 05:59:11     INFO -  In file included from Unified_cpp_js_src_jit8.cpp:2:
[task 2020-09-11T05:59:11.896Z] 05:59:11    ERROR -  /builds/worker/checkouts/gecko/js/src/jit/Lowering.cpp:2362:16: error: unused variable 'opd' [-Werror,-Wunused-variable]
[task 2020-09-11T05:59:11.896Z] 05:59:11     INFO -    MDefinition* opd = truncate->input();
[task 2020-09-11T05:59:11.896Z] 05:59:11     INFO -                 ^
[task 2020-09-11T05:59:11.896Z] 05:59:11     INFO -  1 error generated.
[task 2020-09-11T05:59:11.896Z] 05:59:11     INFO -  /builds/worker/checkouts/gecko/config/rules.mk:723: recipe for target 'Unified_cpp_js_src_jit8.o' failed
[task 2020-09-11T05:59:11.896Z] 05:59:11    ERROR -  make[4]: *** [Unified_cpp_js_src_jit8.o] Error 1
[task 2020-09-11T05:59:11.896Z] 05:59:11     INFO -  make[4]: Leaving directory '/builds/worker/workspace/obj-build/js/src/jit'
[task 2020-09-11T05:59:11.896Z] 05:59:11     INFO -  make[4]: *** Waiting for unfinished jobs....
Pushed by abutkovits@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/aad37f6cbf03 Part 6.2: Establish dependency from tls for x86 callWithABI div/mod i64. r=lth
Pushed by abutkovits@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/e127d8c36c87 Backed out changeset aad37f6cbf03 for landing an incomplete stack of patches. https://hg.mozilla.org/integration/autoland/rev/1b94af3458ce Part 1: Reserve two slots after stack arguments for the future tls preservation. r=lth https://hg.mozilla.org/integration/autoland/rev/ab5825b43bb5 Part 2: Preserve callee and caller tls'es. r=lth https://hg.mozilla.org/integration/autoland/rev/b79c89e6ac82 Part 3: Implement the algorithm for obtaining tls and use it for wasm signal handling. r=lth https://hg.mozilla.org/integration/autoland/rev/f2e486f1be17 Part 4: Untie frame iteration from Frame::tls. r=lth https://hg.mozilla.org/integration/autoland/rev/62480de389ff Part 5: Remove remaining uses of Frame::tls. r=lth. CLOSED TREE
Pushed by lhansen@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/444f4554ea94 Part 6.2: Establish dependency from tls for x86 callWithABI div/mod i64. r=lth
Backout by abutkovits@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/1fc282e54b7a Backed out 5 changesets for landing the wrong stack of patches. CLOSED TREE

The backout from comment 87 is because the stack from comment 85 wasn't supposed to be landed .

@abutkovits , let's land this one https://phabricator.services.mozilla.com/D89243
The naming might be confusing but 6.5 is an internal numbering.
I want to land this 6.5 separately because actually it is independent patch.

Flags: needinfo?(dbezhetskov) → needinfo?(abutkovits)
Pushed by btara@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/174a622bf334 Part 6.5: Add tls dependency for WasmModD. r=lth
Pushed by btara@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/659b24c3626c Part 6.4: Add tls dependency for WasmTruncateToInt64 and Int64ToFloatingPoint for arm. r=lth
Pushed by csabou@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/83a3ed07220d Part 6.3: Establish dependency from tls for arm callWithABI div/mod i64. r=lth
Pushed by ccoroiu@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/09390cf1d667 Part 6.6: Add tls dependency for truncate i32. r=lth
Attachment #9153756 - Attachment is obsolete: true
Attachment #9162755 - Attachment description: Bug 1639153 - Part 7: Remove Frame::tls. r=luke → Bug 1639153 - Part 7: Remove Frame::tls. r=lth
Regressions: 1664979
Backout by apavel@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/41fa4179cc8c Backed out 8 changesets (bug 1639153, bug 1664953) due to general instability on a CLOSED TREE

Create the proxy stub lazily for calls which go through tables.
This stub will preserve caller tls, load callee's pinned register state and etc.
In the followup patches we will fill them with the that logic, but right now let's concentrate on the runtime part.
We bake callee and tls into the trampoline stub to be able to do the switch in the near future.

Backed out from beta as well, as requested by lth:
https://hg.mozilla.org/releases/mozilla-beta/rev/a9d230c1649f

Attachment #9162746 - Attachment description: Bug 1639153 - Part 6: Force Ion to preserve its tls. r=lth → Bug 1639153 - Force Ion to preserve its tls. r=lth
Attachment #9172413 - Attachment description: Bug 1639153 - Part 6.1: Untie callWithAbi from Frame::tls for Baseline. r=lth → Bug 1639153 - Untie callWithAbi from Frame::tls for Baseline. r=lth
Attachment #9162746 - Attachment description: Bug 1639153 - Force Ion to preserve its tls. r=lth → Bug 1639153 - Part 6.0: Force Ion to preserve its tls. r=lth
Attachment #9172413 - Attachment description: Bug 1639153 - Untie callWithAbi from Frame::tls for Baseline. r=lth → Bug 1639153 - Part 6.1: Untie callWithAbi from Frame::tls for Baseline. r=lth
Pushed by btara@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/0e268b8d01fe Part 6.0: Force Ion to preserve its tls. r=lth https://hg.mozilla.org/integration/autoland/rev/f0f44cfd6a07 Part 6.1: Untie callWithAbi from Frame::tls for Baseline. r=lth https://hg.mozilla.org/integration/autoland/rev/c5ef1152da2b Part 6.2: Establish dependency from tls for x86 callWithABI div/mod i64. r=lth https://hg.mozilla.org/integration/autoland/rev/bba9e8259714 Part 6.3: Establish dependency from tls for arm callWithABI div/mod i64. r=lth https://hg.mozilla.org/integration/autoland/rev/e2105a52bcdd Part 6.4: Add tls dependency for WasmTruncateToInt64 and Int64ToFloatingPoint for arm. r=lth https://hg.mozilla.org/integration/autoland/rev/81e645d5ccfd Part 6.5: Add tls dependency for WasmModD. r=lth https://hg.mozilla.org/integration/autoland/rev/bd8381646b47 Part 6.6: Add tls dependency for truncate i32. r=lth
Regressions: 1666051

Hi, could you please land the stack of patches together:
[https://phabricator.services.mozilla.com/D91081, https://phabricator.services.mozilla.com/D82881]
There are patches Bug 1639153 Part 1 - Part 5, then Part 7 and finally this one https://phabricator.services.mozilla.com/D91081.

Flags: needinfo?(abutkovits)

Hi Dmitry, the stack of patches cannot be landed due to the following requirement "Revision is missing a Testing Policy Project Tag." .
Please check https://phabricator.services.mozilla.com/D83064#2965334 and add the Check-in Needed flag after the requirement is fulfilled.

Flags: needinfo?(dbezhetskov)

The testing policy tag was added.

Flags: needinfo?(dbezhetskov)

Patch failed to land with :
"Details: We're sorry, Lando could not rebase your commits for you automatically. Please manually rebase your commits and try again.

hg error in cmd: hg import --no-commit -s 95 /tmp/tmpdrpaobho: applying /tmp/tmpdrpaobho

patching file js/src/wasm/WasmInstance.cpp
Hunk #1 FAILED at 1745
1 out of 1 hunks FAILED -- saving rejects to file js/src/wasm/WasmInstance.cpp.rej
abort: patch failed to apply"

Flags: needinfo?(abutkovits) → needinfo?(dbezhetskov)

This patch makes use of the new "Baldrdash2020" ABI support in Cranelift
to support the "ABI 2020" refactor in the Wasm compiler.

Regressions: 1671871

Dmitry, could you please add the missing testing-tags for the patches in this lando stack so that they can be landed?
https://lando.services.mozilla.com/D93190/

Hi :malexandru, I've added the missing test tags.

Flags: needinfo?(dbezhetskov)
Pushed by ncsoregi@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/69b8eca9b1ef Part 1: Reserve two slots after stack arguments for the future tls preservation. r=lth https://hg.mozilla.org/integration/autoland/rev/e8164759f391 Part 2: Preserve callee and caller tls'es. r=lth https://hg.mozilla.org/integration/autoland/rev/a072d5ee2511 Part 3: Implement the algorithm for obtaining tls and use it for wasm signal handling. r=lth https://hg.mozilla.org/integration/autoland/rev/db39bf2058e1 Part 4: Untie frame iteration from Frame::tls. r=lth https://hg.mozilla.org/integration/autoland/rev/270961083337 Part 5: Remove remaining uses of Frame::tls. r=lth https://hg.mozilla.org/integration/autoland/rev/cd503f8f9a72 Part 7: Remove Frame::tls. r=lth https://hg.mozilla.org/integration/autoland/rev/a24b2ff5a2f1 Part 8: Adapt Cranelift-based Wasm to use ABI-2020. r=lth

Backed out for spidermonkey bustage

backout: https://hg.mozilla.org/integration/autoland/rev/5b5ffbe4add97c98003026e0d9a730302161b646

push: https://treeherder.mozilla.org/#/jobs?repo=autoland&group_state=expanded&revision=a24b2ff5a2f1a5605c6767d14417bd4f7a032586&selectedTaskRun=VsS2eOFOTeKaDHxyJHwmLA.0

failure log: https://treeherder.mozilla.org/logviewer.html#/jobs?job_id=319118736&repo=autoland&lineNumber=88070

[task 2020-10-20T11:26:36.939Z] /builds/worker/workspace/sm-package/mozjs-84.0a1.0/obj-spider/dist/host/bin/dump_syms /builds/worker/workspace/sm-package/mozjs-84.0a1.0/obj-spider/js/src/build/libmozjs-84a1.so
[task 2020-10-20T11:26:36.939Z] Unexpected error: 'ascii' codec can't decode byte 0x90 in position 3510: ordinal not in range(128)
[task 2020-10-20T11:26:36.941Z] Traceback (most recent call last):
[task 2020-10-20T11:26:36.941Z] File "/builds/worker/workspace/sm-package/mozjs-84.0a1.0/toolkit/crashreporter/tools/symbolstore.py", line 1086, in <module>
[task 2020-10-20T11:26:36.941Z] main()
[task 2020-10-20T11:26:36.941Z] File "/builds/worker/workspace/sm-package/mozjs-84.0a1.0/toolkit/crashreporter/tools/symbolstore.py", line 1081, in main
[task 2020-10-20T11:26:36.941Z] dumper.Process(args[2], options.count_ctors)
[task 2020-10-20T11:26:36.941Z] File "/builds/worker/workspace/sm-package/mozjs-84.0a1.0/toolkit/crashreporter/tools/symbolstore.py", line 523, in Process
[task 2020-10-20T11:26:36.941Z] self.ProcessFile(file_to_process, count_ctors=count_ctors)
[task 2020-10-20T11:26:36.941Z] File "/builds/worker/workspace/sm-package/mozjs-84.0a1.0/toolkit/crashreporter/tools/symbolstore.py", line 538, in ProcessFile
[task 2020-10-20T11:26:36.941Z] file, arch_num, arch, vcs_root, dsymbundle, count_ctors=count_ctors
[task 2020-10-20T11:26:36.941Z] File "/builds/worker/workspace/sm-package/mozjs-84.0a1.0/toolkit/crashreporter/tools/symbolstore.py", line 590, in ProcessFileWork
[task 2020-10-20T11:26:36.941Z] for line in proc.stdout:
[task 2020-10-20T11:26:36.941Z] File "/usr/lib/python3.6/encodings/ascii.py", line 26, in decode
[task 2020-10-20T11:26:36.941Z] return codecs.ascii_decode(input, self.errors)[0]
[task 2020-10-20T11:26:36.941Z] UnicodeDecodeError: 'ascii' codec can't decode byte 0x90 in position 3510: ordinal not in range(128)
[task 2020-10-20T11:26:36.960Z] Traceback (most recent call last):
[task 2020-10-20T11:26:36.960Z] File "/usr/lib/python3.6/runpy.py", line 193, in _run_module_as_main
[task 2020-10-20T11:26:36.960Z] "main", mod_spec)
[task 2020-10-20T11:26:36.960Z] File "/usr/lib/python3.6/runpy.py", line 85, in _run_code
[task 2020-10-20T11:26:36.960Z] exec(code, run_globals)
[task 2020-10-20T11:26:36.960Z] File "/builds/worker/workspace/sm-package/mozjs-84.0a1.0/python/mozbuild/mozbuild/action/dumpsymbols.py", line 106, in <module>
[task 2020-10-20T11:26:36.960Z] sys.exit(main(sys.argv[1:]))
[task 2020-10-20T11:26:36.960Z] File "/builds/worker/workspace/sm-package/mozjs-84.0a1.0/python/mozbuild/mozbuild/action/dumpsymbols.py", line 102, in main
[task 2020-10-20T11:26:36.960Z] args.count_ctors)
[task 2020-10-20T11:26:36.960Z] File "/builds/worker/workspace/sm-package/mozjs-84.0a1.0/python/mozbuild/mozbuild/action/dumpsymbols.py", line 83, in dump_symbols
[task 2020-10-20T11:26:36.960Z] out_files = subprocess.check_output(args, universal_newlines=True)
[task 2020-10-20T11:26:36.960Z] File "/usr/lib/python3.6/subprocess.py", line 336, in check_output
[task 2020-10-20T11:26:36.960Z] **kwargs).stdout
[task 2020-10-20T11:26:36.960Z] File "/usr/lib/python3.6/subprocess.py", line 418, in run
[task 2020-10-20T11:26:36.960Z] output=stdout, stderr=stderr)
[task 2020-10-20T11:26:36.960Z] subprocess.CalledProcessError: Command '['/builds/worker/workspace/sm-package/mozjs-84.0a1.0/obj-spider/_virtualenvs/init_py3/bin/python', '/builds/worker/workspace/sm-package/mozjs-84.0a1.0/toolkit/crashreporter/tools/symbolstore.py', '-c', '--vcs-info', '--install-manifest=/builds/worker/workspace/sm-package/mozjs-84.0a1.0/obj-spider/_build_manifests/install/dist_include,/builds/worker/workspace/sm-package/mozjs-84.0a1.0/obj-spider/dist/include', '-s', '/builds/worker/workspace/sm-package/mozjs-84.0a1.0', '/builds/worker/workspace/sm-package/mozjs-84.0a1.0/obj-spider/dist/host/bin/dump_syms', '/builds/worker/workspace/sm-package/mozjs-84.0a1.0/obj-spider/dist/crashreporter-symbols', '/builds/worker/workspace/sm-package/mozjs-84.0a1.0/obj-spider/js/src/build/libmozjs-84a1.so']' returned non-zero exit status 1.
[task 2020-10-20T11:26:36.960Z] Running: /builds/worker/workspace/sm-package/mozjs-84.0a1.0/obj-spider/_virtualenvs/init_py3/bin/python /builds/worker/workspace/sm-package/mozjs-84.0a1.0/toolkit/crashreporter/tools/symbolstore.py -c --vcs-info --install-manifest=/builds/worker/workspace/sm-package/mozjs-84.0a1.0/obj-spider/_build_manifests/install/dist_include,/builds/worker/workspace/sm-package/mozjs-84.0a1.0/obj-spider/dist/include -s /builds/worker/workspace/sm-package/mozjs-84.0a1.0 /builds/worker/workspace/sm-package/mozjs-84.0a1.0/obj-spider/dist/host/bin/dump_syms /builds/worker/workspace/sm-package/mozjs-84.0a1.0/obj-spider/dist/crashreporter-symbols /builds/worker/workspace/sm-package/mozjs-84.0a1.0/obj-spider/js/src/build/libmozjs-84a1.so

Flags: needinfo?(dbezhetskov)

This is kind of an odd failure.

It appears that running obj-spider/dist/host/bin/dump_syms .../obj-spider/js/src/build/libmozjs-84a1.so is outputting a non-ASCII character (0x90), and python is choking on it. It's Python 3.6, and it's using univeral_newlines=True, so I would have expected this to be decoding to utf8, not ASCII, but I'm perpetually confused by this encoding stuff. Perhaps the system encoding is wrong for some reason.

I don't know if this means there's an actual symbol with a weird name, or if there was an error message with a non-ASCII character in it, or what.

I'm going to see if I can diagnose it with an interactive shell job.

This seems to be some kind of problem with the crashreporter?

Specifically: I ran the above command on an interactive loaner and saved stdout and stderr to files. The stdout file looked mostly ok, but had one line with a mangled filename:

FILE 1761 /rustc/4fb7144ed159f94491249e86d5bbd033b5d60550/src/libcore/convert/mod.rs
FILE 1762 /rustc/4fb7144ed159f94491249e86d5bbd033b5d60550/src/libcore/convert/num.rs
FILE 1763 /rustc/4fb7144ed159f94491249e86d5bbd033b5d60550/src/libcore/ffi.rs
FILE 1764 /rustc/4fb7144ed159f94491249e86d5bbd033b5d60550/src/libcore/fmt/builders.rs
FILE 1765 /rustc/4fb7144ed159f94491249e86d5bbd033b5d60550/src/libcore/fmt/k^B@^A^E^N^C^U<90>^B^G
FILE 1766 /rustc/4fb7144ed159f94491249e86d5bbd033b5d60550/src/libcore/fmt/mod.rs
FILE 1767 /rustc/4fb7144ed159f94491249e86d5bbd033b5d60550/src/libcore/fmt/num.rs
FILE 1768 /rustc/4fb7144ed159f94491249e86d5bbd033b5d60550/src/libcore/hash/mod.rs
FILE 1769 /rustc/4fb7144ed159f94491249e86d5bbd033b5d60550/src/libcore/hash/sip.rs

That's cut & paste from the interactive terminal web page. The actual bytes can be inferred, but the only one that is higher than 0x7f is the 0x90.

That directory contains 4 files. The above entry should say float.rs. The filesystem is fine; the name is only mangled in the dump_syms output.

So I'm guessing the problem has nothing to do with this bug; it probably just changed the directory listing enough to produce the particular invalid character here rather than other invalid characters.

I'm not sure who to needinfo here, and have to run right now, so I'll post what I know.

Depends on: 1673103

I think we can try to land the stack again because 1673103 is fixed.

Flags: needinfo?(dbezhetskov)
Pushed by abutkovits@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/22c94980495f Part 1: Reserve two slots after stack arguments for the future tls preservation. r=lth https://hg.mozilla.org/integration/autoland/rev/3e82e75c4992 Part 2: Preserve callee and caller tls'es. r=lth https://hg.mozilla.org/integration/autoland/rev/82fbedcdde75 Part 3: Implement the algorithm for obtaining tls and use it for wasm signal handling. r=lth https://hg.mozilla.org/integration/autoland/rev/25281e1775c9 Part 4: Untie frame iteration from Frame::tls. r=lth https://hg.mozilla.org/integration/autoland/rev/1e72f84b4206 Part 5: Remove remaining uses of Frame::tls. r=lth https://hg.mozilla.org/integration/autoland/rev/c414a69233f0 Part 7: Remove Frame::tls. r=lth https://hg.mozilla.org/integration/autoland/rev/42728e99fbc5 Part 8: Adapt Cranelift-based Wasm to use ABI-2020. r=lth

The issue was fixed, let's land the stack.

Flags: needinfo?(dbezhetskov)
Pushed by nerli@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/b24505537d70 Part 1: Reserve two slots after stack arguments for the future tls preservation. r=lth https://hg.mozilla.org/integration/autoland/rev/50772b4ff6aa Part 2: Preserve callee and caller tls'es. r=lth https://hg.mozilla.org/integration/autoland/rev/3cfa16c9f9a9 Part 3: Implement the algorithm for obtaining tls and use it for wasm signal handling. r=lth https://hg.mozilla.org/integration/autoland/rev/5dfb9fc15fba Part 4: Untie frame iteration from Frame::tls. r=lth https://hg.mozilla.org/integration/autoland/rev/5737845e8af2 Part 5: Remove remaining uses of Frame::tls. r=lth https://hg.mozilla.org/integration/autoland/rev/107ac71d3eb1 Part 7: Remove Frame::tls. r=lth https://hg.mozilla.org/integration/autoland/rev/49de28d6a75a Part 8: Adapt Cranelift-based Wasm to use ABI-2020. r=lth CLOSED TREE
Backout by abutkovits@mozilla.com: https://hg.mozilla.org/mozilla-central/rev/adaa820203d1 Backed out 7 changesets for crashing WasmTrapHandler. DONTBUILD a=backout

The last 7 changesets got backed out for crashes reported by users:

[@ WasmTrapHandler ] from 14 installations when I checked
https://crash-stats.mozilla.org/report/index/f0eb42d8-158a-426a-9034-f57830201031

Please also check if these are related (2 installations, all startup crashes):
[@ core::ops::function::Fn::call<T> | std::sys_common::backtrace::__rust_end_short_backtrace<T> ]

crash stack 1: https://crash-stats.mozilla.org/report/index/655326bc-3a73-4588-adad-d1db40201031
crash stack 2: https://crash-stats.mozilla.org/report/index/85a6bb2b-2f8b-4de0-914d-821eb0201031

Flags: needinfo?(dbezhetskov)
Pushed by abutkovits@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/48d5f5918e01 Part 1: Reserve two slots after stack arguments for the future tls preservation. r=lth https://hg.mozilla.org/integration/autoland/rev/6d85470024c3 Part 2: Preserve callee and caller tls'es. r=lth https://hg.mozilla.org/integration/autoland/rev/f981949ae991 Part 3: Implement the algorithm for obtaining tls and use it for wasm signal handling. r=lth https://hg.mozilla.org/integration/autoland/rev/c50443ca5619 Part 4: Untie frame iteration from Frame::tls. r=lth https://hg.mozilla.org/integration/autoland/rev/1f21e8f610a7 Part 5: Remove remaining uses of Frame::tls. r=lth https://hg.mozilla.org/integration/autoland/rev/315a592cfd3c Part 7: Remove Frame::tls. r=lth https://hg.mozilla.org/integration/autoland/rev/beed4ac8993b Part 8: Adapt Cranelift-based Wasm to use ABI-2020. r=lth
Backout by ccoroiu@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/19ebe2dac480 Backed out 7 changesets for accidentally landing

[@ WasmTrapHandler ] from 14 installations when I checked
Hm, the main crash (https://crash-stats.mozilla.org/report/index/f0eb42d8-158a-426a-9034-f57830201031) isn't in my code, and the corresponding stack doesn't point to me, are you sure that my changes trigger the crash?
How can I reproduce it?

Please also check if these are related (2 installations, all startup crashes):
They are rust crashes on windows x64 because not implemented: No FastCall variant of Baldrdash2020 iiuc we don't support baseline+cranelift configuration on win x64.

Flags: needinfo?(dbezhetskov) → needinfo?(aryx.bugmail)

(In reply to Dmitry Bezhetskov from comment #125)

[@ WasmTrapHandler ] from 14 installations when I checked
Hm, the main crash (https://crash-stats.mozilla.org/report/index/f0eb42d8-158a-426a-9034-f57830201031) isn't in my code, and the corresponding stack doesn't point to me, are you sure that my changes trigger the crash?
How can I reproduce it?

Example urls to reproduce the issue (according to crash reports):

Flags: needinfo?(aryx.bugmail)

I think I catch this one: https://crash-stats.mozilla.org/report/index/f0eb42d8-158a-426a-9034-f57830201031, at least I can reproduce a crash with my changes. I use a different workflow from this bug (https://bugzilla.mozilla.org/show_bug.cgi?id=1657917) but hope that the underlying problem is the same.

I'm investigating ...

Finally fixed the issue.
The problem was in MacroAssembler::wasmCallIndirect when callee.which() == wasm::CalleeDesc::AsmJSTable. I made a typo when worked on part 2. I preserved WasmTlsReg twice on WasmCallerTLSOffsetBeforeCall but I needed to preserve it on WasmCaller* and on WasmCallee*.

Because of this
GetNearestEffectiveTls(frame) return some garbage value and then
Instance* instance = GetNearestEffectiveTls(frame)->instance; led to SIGSEGV.

Pushed by malexandru@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/8e95ddbb6b20 Part 1: Reserve two slots after stack arguments for the future tls preservation. r=lth https://hg.mozilla.org/integration/autoland/rev/a5cd8e8ce4b5 Part 2: Preserve callee and caller tls'es. r=lth https://hg.mozilla.org/integration/autoland/rev/aeacd48f565e Part 3: Implement the algorithm for obtaining tls and use it for wasm signal handling. r=lth https://hg.mozilla.org/integration/autoland/rev/cee7eaa8c402 Part 4: Untie frame iteration from Frame::tls. r=lth https://hg.mozilla.org/integration/autoland/rev/40031ad88dbd Part 5: Remove remaining uses of Frame::tls. r=lth https://hg.mozilla.org/integration/autoland/rev/5718a8d121f9 Part 7: Remove Frame::tls. r=lth https://hg.mozilla.org/integration/autoland/rev/7017395d8df9 Part 8: Adapt Cranelift-based Wasm to use ABI-2020. r=lth
Regressions: 1679010

Fill the stub logic: preserve tls's, switch pinned register state
for callee, call callee and then restore the initial state.
Since trampoline stub produces additional interstitial frame we tag
callerFP to indicate such frames inside frame iterator.
Stack maps are also adjusted not to count non-gc slots.

Attachment #9198411 - Attachment description: Bug 1639153 - Implement trampoline tls stub. r=lth → Bug 1639153 - Implement basic logic of trampoline tls stub. r=lth

Add the dynamic check and the fastpath for call_indirect
when target instance and source instance are actually the same.

When we know that instances are equal at instantiate time we replace
pointer to the trampoline with pointer to the checked call entry.

I'm just dumping some interstitial results for our 2/3 of this long optimization.
Recently I've uploaded a few patches that have demonstrated some performance results that I want to share.
To measure the performance I used my ubuntu intel core i7 laptop with 16gb of RAM.
Also, I've used calls_bench (posted above) for tests.

Master:
external 348
internal 294
import 225
direct 140

D90992: add a simple proxy trampoline patch that actually only jumps into the real code:
external 369
internal 328
import 231
direct 142

Well, one jump introduced its cost.

D102591: implement trampoline basic logic - switch tls back and forth:
external 357
internal 391
import 231
direct 142

Trampoline logic has its cost and so internal calls get significant slowdown.

D103573: let's add a dynamic check if call_indirect actually calls a function from our instance
external 397
internal 259
import 227
direct 142

Internal calls now actually become faster, but we can do better.

D103574: at instantiating time we can check if instances are equal and replace the trampoline pointer with a pointer to the function call checked entry.
external 405
internal 209
import 226
direct 141

In the end, we get a 16% slowdown for external calls and a 29% speedup for internal calls without any significant effect for import and direct calls.
I don't know how it affects wild-industrial use cases but the numbers are looked promising.

The last part of the optimization will be changing the representation of WebAssembly.Table.Elem that we can use for speeding up table.get/set inside wasm.

14 April integration test of current patch set:

  • Applying the current four-patch set to this morning's mozilla-central went smoothly (one small formatting conflict).
  • x64-release-debug builds completed OK.
  • x64-release-debug jit-test runs on wasm/ and asm.js/ (without --tbpl) had 91 failures: assertions and crashes around stack traversal, reference types, and exception handling.
  • x86-release-debug jit-tests ditto had 80 failures, same categories
  • arm64-release-debug builds require --disable-cranelift for the time being, this is OK.
  • arm64-release-debug jit-test runs on wasm/ and asm.js/ (without --tbpl) had 130 failures: assertions and crashes as above but also some UNIMPLEMENTED simulator crashes and a crash warning about an unaligned memory access (somebody forgot to untag a pointer?)
  • arm-release-debug tests ditto had 114 failures, about as for arm64, including simulator crashes

It's fairly high priority to make the patches that are up for review pass at least the integration tests, so that the review sees code that is plausibly correct.

Flags: needinfo?(dbezhetskov)
Depends on: 1675239
Blocks: 1340235
Priority: P3 → P2

Introduce indirect stubs for call_indirect.
Use special interstitial trampolines for:
a) call_indirect some_imported_table idx
because we don't know what are calling
b) call_indirect our_private_table wasm_func_idx
because we actually know that this is a cross-instance call_instance call

for all other cases we can use raw function pointers because
(a) we don't change the instance
and (b) our wasm ABI guarantee that after the call tls* will be in the valid state, so
the following cases use raw function calls:

  • call_indirect our_private_table js_func_idx
  • call_indirect our_private_table internal_wasm_func

Trampolines or indirect stubs needs to be bulk created only for exported functions
and for internal functions that are inserted into exported tables.

Introduce indirect stubs for call_indirect.
Use special interstitial trampolines for:
a) call_indirect some_imported_table idx
because we don't know what are calling
b) call_indirect our_private_table wasm_func_idx
because we actually know that this is a cross-instance call_instance call

for all other cases we can use raw function pointers because
(a) we don't change the instance
and (b) our wasm ABI guarantee that after the call tls* will be in the valid state, so
the following cases use raw function calls:
- call_indirect our_private_table js_func_idx
- call_indirect our_private_table internal_wasm_func

Trampolines or indirect stubs needs to be bulk created only for exported functions
and for internal functions that are inserted into exported tables.

Attachment #9225802 - Attachment description: Bug 1639153 - Introduce thunk stubs for indirect calls. r=lth → Bug 1639153 - Introduce indirect stubs to optimize call_indirect. r=lth
Attachment #9222051 - Attachment is obsolete: true
Attachment #9200275 - Attachment is obsolete: true
Attachment #9200274 - Attachment is obsolete: true
Attachment #9198411 - Attachment is obsolete: true
Attachment #9177105 - Attachment is obsolete: true

(Some initial notes, I will edit these as more data become available.)

Base rev: mozilla-central 7c3ea3514425. Applied the latest (Jul 8) rev of D117123. Built it with ../configure --disable-debug --enable-optimize --enable-release --enable-debug-symbols and also made a comparable build from the base rev.

The benchmark is the call_indirect_ubench.js attached above. I run the JS shell with --wasm-compiler=ion, no other switches, loading the benchmark file on the command line.

Consistently on my Xeon E5-2637 system with Fedora 33 in the JS shell, comparing with-the-patch to without-the-patch:
"external" calls (ie cross-module indirect calls) drop running time by 14% (highly surprising, these ought to be be slower)
"internal" calls (ie same-module indirect calls) drop running time by 23% (almost precisely what Dmitry has reported)
"direct" calls increase running time by 6% (very surprising, these ought to be invariant)

On a Core-i7 with macOS 11 (this is a 2018 MacBook Pro), ditto:
"external" calls slow down slightly, but really not much (very nice)
"internal" calls speed up by about 14% (so less than on the Xeon and what Dmitry reported)
"direct" calls are roughly invariant (as desired)

On an Apple M1 with macOS 11 (M1 MacBook Pro):
"external" calls slow down by a lot (running time goes from 270 to 430, ie increasing by 60%)
"internal" calls speed up by about 15% (comparable to i7)
"direct" calls are roughly invariant (as desired)

Another set of benchmarks. This is doubly-recursive fibonacci(40) using indirect calls. There are four cases:

  • DIR: direct calls
  • IIP: indirect calls, intra-module with a private table
  • IIE: indirect calls, intra-module but with a shared (exported) table
  • IIX: indirect calls, inter-module (requires two modules and a shared table)
                Baseline                         Improved
       DIR   IIP   IIE    IIX             DIR   IIP    IIE   IIX
Xeon   753   1410  1425   1425            754   1070   1300  2080
i7     647   1130  1130   1140            645    836   1100  1520
M1     650   1050  1045   1045            650    770    980  1420

On all CPUs, in the baseline case the indirect programs all have roughly the same performance, which is what we want to see, and the direct case is much faster. Also, the direct cases have the same performance with the old and the new code, as expected & desired. For the indirect cases with the improved code, we see that there's a significant cost to the stubs: the IIX case takes almost twice as long as the IIP case. That said, the IIX case is "only" about 30% - 40% slower than the IIX case of the baseline code. The IIE case shows off an optimization where we can't statically say that a call is intramodule but we can detect it dynamically (by comparing the Tls values). This optimization is very effective and will be worthwhile if we think that this case will be common.

Attached file fib.tar (deleted) —

Fib benchmarks.

Thanks Lars for the testing,

I've run your fib tests too on my threadripper/Ubuntu 20.04 machine with the same configurations.
I've got the following results:

        Baseline                           Improved                          Improved - dynamic check
DIR   IIP   IIE    IIX             DIR   IIP    IIE   IIX                  DIR   IIP    IIE   IIX
633   1187  1180   1256            635   849   1029  1993                  635   853   1398  1799

I've included numbers for the version where we don't do the dynamic check because imho it is a rare case and removing it improving results for IIX a little bit.

Flags: needinfo?(dbezhetskov)

I think about call_indirect usage and find this research (https://www.usenix.org/system/files/sec20-lehmann.pdf) Table 2. It looks like the number of call_indirect can be significant while almost all wasm usages are single module usages, so I think we should care less about truly external cases, wdyt?

Flags: needinfo?(lhansen)

almost all wasm usages are single module usages

I'm a little skeptical about that (is our population all wasm programs on the net or only the prominent/important ones; there are fewer of the latter, but the number of instances of the latter probably dwarfs the number of instances of the former) but I think that when a program is multi-module this will tend to be because it uses dynamic linking, and the cost is probably affordable.

I'm probably willing to believe that we can afford the extra cost of truly external calls. I think we should probably keep the dynamic check for now but once we have better heuristics for when a table is private we should reevaluate it.

Flags: needinfo?(lhansen)

I've done a little bit of memory profiling, not very systematic yet, just testing the waters.

The methodology for this experiment is to disable the wasm baseline jit (because some programs will be compiled without tiering and for the programs that are, object code caching will restore the optimized code on reload) and then load some wasm-using pages, logging the wasm machine code allocations.

Google earth w/o patch: 34668544 + 65536
Google earth w/ patch: 31260672 + 5242880 + 65536

This illustrates (a) that the patch can have a significant positive effect on the machine code size for the program itself (the 30+MB allocation goes from 34.7MB to 31.3MB), because indirect calls are represented by a more compact machine code sequence and (b) that the indirect stubs can take a significant amount of space, here about 17% of the machine code. The overall increase caused by the patch is only 5% however, due to the mentioned savings.

PSPDFKIT ("standalone" demo) w/o patch: 23265280
PSPDFKIT w/ patch: 23003136 + 1835008

Here, there are scant savings from more compact code, and though there's a relatively smaller amount of stub code for indirect calls, there is a total increase of 7% in executable memory.

ruffle.rs is a flash emulator. Go to the site, then click "demo", then select "Saturday morning watchmen" from the top right menu.

w/o patch: 1900544 + 65536 + 8126464 + 65536
w/ patch: 1835008 + 65536 + 131072 + 65536 + 8060928 + 65536 + 131072 + 262144 + 65536

Here we may see the effect of lazily creating some stubs, and/or the effect of storing the stub with the calling module. In both runs there are two large blobs, call them "first page" and "second page". During the second run, there is (131072 + 65536) extra memory allocated for the first page and (131072 + 262144 + 65536) for the second page, this memory is all for indirect stubs. All told we're seeing a 5% increase in executable memory for the program.

(Obviously there are some quantization effects in all of these numbers since only full 64KB pages can be allocated.)

I don't think these increases are terribly scary. I do think that we need to focus a bit on bringing these numbers down, if we can, by generating more compact stubs (removing the dynamic check for the intra-instance call would help) or by generating fewer (more aggressive determination of whether a table is private to an instance). I don't think that should block landing.

It's a little harder to test the autocad web app (web.autocad.com), there's probably a bug somewhere... in ion... Anyway, disable ion and enable baseline, login to the site and "stay logged in" and then bring up a blank tab, close the autocad tab, and then quit. Do this in both browsers. Now when the browser starts it should start in the blank tab. (Without this, it could look like the browser tried to load the autocad site on startup, also weird.) To test, load web.autocad.com, click on "samples", and select the roller assembly sample, this comes up quickly enough. Then Ctrl-L and type about:blank and enter, to unload the page.

Here there are many allocations, so I've captured a log and processed the log. In the end, without the patch the executable memory for wasm reaches 188.7MB at peak, and with the patch it reaches 200.0MB, a 6% increase. Since this is baseline code, we should expect the actual relative increase to be larger than that (with a measured peak being lower). If I can generate Ion numbers, I will post an update. (Edit: Ion problem tracked by bug 1728781.)

It could look like there are about 100 more allocations with the patch than without; these would be for individual pages that are needed for lazy indirect stubs, I expect. (Edit: Indeed this works out really well: 100 allocations should work out to about 7MB, there's also a separate 7.5MB allocation in the run with the patch, for 14.5MB total; meanwhile, the program's code with the patch is about 3.5MB smaller. 14.5-3.5=11, which is very close to the difference between the runs.)

Autocad was always considered a hard problem due to its structure; it's encouraging that this looks not particularly worse than other sites, so far.

Preliminary Ion numbers for web.autocad.com: without the patch the peak is 141.4MB, with the patch the peak is 150.0 MB, for a 6% increase here as well. This is OK - we want to try to shrink it, but we can live with it. Again, there are about 100 more individual lazy allocations totalling about 6.5MB, and an additional large 7.5MB eager allocation, but a 6.5MB savings on program object code.

Summarizing the memory profiling:

  • we are up about 6% on peak executable memory consumption across all programs
  • this is not great but is totally acceptable; we should however dig into these numbers a bit to look for savings
  • if we remove the dynamic same-instance check from the stubs the eager stub allocations will likely be more compact
  • once we improve the code for avoiding even eager stub allocations for known-same-instance calls, eager stub allocations may go away, though later bulk-upgrade of tables when they are discovered to not be same-instance will remove those savings in some cases
  • once we use luke's clever idea for lazy stubs, most executable allocations for lazy stubs will disappear because we can allocate them densely instead one on every page. this will be very helpful for eg autocad, which has many lazy allocations, but for the other applications i don't know see that it will help much.

Possible bug: I'm not sure why we need to be allocating 64KB of executable memory for each lazy stub, as this computer uses 4KB pages, but that's what it looks like. This may be worth looking into soon. We allocate in 4KB chunks and this is the best we can do, probably.

Pushed by dbezhetskov@igalia.com: https://hg.mozilla.org/integration/autoland/rev/d6506b5ee07b Introduce indirect stubs to optimize call_indirect. r=lth
Pushed by dbezhetskov@igalia.com: https://hg.mozilla.org/integration/autoland/rev/a0e51f584680 Introduce indirect stubs to optimize call_indirect. r=lth

Backed out for causing Bug 1731031.

Pushed by dbezhetskov@igalia.com: https://hg.mozilla.org/integration/autoland/rev/1a73cfde72e8 Introduce indirect stubs to optimize call_indirect. r=lth
Flags: needinfo?(dbezhetskov)
Regressions: 1733040
Regressions: 1733016

Backed out for causing the Bug 1733016.

(In reply to Cristian Tuns from comment #151)

Backed out for causing Bug 1731031.

== Change summary for alert #31797 (as of Mon, 11 Oct 2021 11:56:06 GMT) ==

Improvements:

Ratio Test Platform Options Absolute values (old vs new)
2% wasm-godot-baseline macosx1015-64-shippable-qr webrender 499.30 -> 487.07

For up to date results, see: https://treeherder.mozilla.org/perfherder/alerts?id=31797

Wohoo, I can reproduce on try (second 'p' failure on https://treeherder.mozilla.org/jobs?repo=try&revision=088f448eeb74e9b3ea34cc54705ee09ebbe1dc19&group_state=expanded&selectedTaskRun=fRNFivQtQQmwxP9SMsFYYQ.0), this is the same fault in a new test case. The symptoms are the same too: we get an indirect call to null to an element that should have been initialized by an active element segment initializer. This is the third case, the other two are https://treeherder.mozilla.org/logviewer?job_id=353031198&repo=autoland&lineNumber=47748 and https://treeherder.mozilla.org/logviewer?job_id=352999489&repo=autoland&lineNumber=28275.

(I'm not really any closer to understanding what this is but I'm currently going over the patch line by line to see if there's anything I've missed.)

Depends on: 1742053
Summary: Optimize indirect calls → Optimize indirect calls by not pessimizing same-instance calls
Depends on: 1742930
No longer blocks: 1340235
Blocks: 1742930
No longer depends on: 1742930
Attachment #9225802 - Attachment is obsolete: true
Blocks: 1743586

Remaining blocking bug is considered fixed, so closing this as fixed. Further work to be filed as individual bugs under the umbrella of bug 1742930.

Status: ASSIGNED → RESOLVED
Closed: 4 years ago3 years ago
Resolution: --- → FIXED
Regressions: 1750930
Regressions: 1752870

Bug 1742053, which landed a refactoring of the last work here, was backed out by bug 1753061. We're not going to attempt to re-land, but will be adopting a different strategy, see bug 1754377.

Regressions: 1715459
Target Milestone: --- → 84 Branch
Blocks: 1754377
No longer blocks: 1754377
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: