1639153 - Optimize indirect calls by not pessimizing same-instance calls

Dmitry Bezhetskov

Assignee

Description

•

5 years ago

No description provided.

Dmitry Bezhetskov

Assignee

Comment 1

•

5 years ago

Attached file Bug 1639153 - change call_indirect signature check to happen after frame is pushed r=lth,wingo (deleted) — Details

Phabricator Automation

Updated

•

5 years ago

Assignee: nobody → dbezhetskov

Attachment #9150086 - Attachment description: Bug 1639153 - change call_indirect signature check to happen after frame is pushed → Bug 1639153 - change call_indirect signature check to happen after frame is pushed r=lth,wingo

Luke Wagner [:luke]

Comment 2

•

5 years ago

I'm a bit confused about putting 2 extra words in each FunctionTableElem; I had been thinking the goal state is to get tables down to a single word. What's the benefit of performing the call_indirect signature check after pushing the frame? If the signature happens before pushing a frame, the trampoline can first check the signature, then push the frame, then call the tail entry (which would be after the signature check). Then there are only two entries per function.

Also nit: could we rename "tail entry" to "trampoline entry"? "Tail" keeps making me thing "tail call entry".

Dmitry Bezhetskov

Assignee

Comment 3

•

5 years ago

Hi, I understand your confusion.
Our final target is still to have two entries table and trampoline.

Final goal:
call_indirect -> call table_entry -> Trampoline stub -> target function

Trampoline stub:
check signature
check for instance equality
switch to callee instance
call the trampoline_entry
switch to caller instance

However, it's hard to achieve this in one commit and if I succeed to create such big commit (I tried to do this) it takes so much time because I don't know what could go wrong, so Andy and I decided to split this work into series of extremely short patches. It's maintained in the Steps to get there section of the abi-2020 doc (https://docs.google.com/document/d/1oi6ROZJuE-hBb21Tgq-XcUegPktecF52gADzBJcYL34).

Long story short:
I will push inlined trampoline stub right in call_indirect and then adapt stack maps and frame iteration.
Next, I'm going to finally remove Frame::tls.
Then we move inlined trampoline into the stub. Here we don't need to care about runtime stuff because it's already done in the previous steps.

Because I'm going to inline trampoline stub on caller side we need to move signature check after the frame is pushed. Because obviously, we can't do the check on the caller side and this is why I moved it.

WDYT?

Flags: needinfo?(luke)

Dmitry Bezhetskov

Assignee

Comment 4

•

5 years ago

An additional question about the final design and the current signature check optimization:
AFAIU a thunk (aka trampoline aka decorator) is a fixed size stub with some connected data and it will be generated independently from the function.

So, we can't bake cmp with immediate into it, as we do know.
We can only load funcTypeId from memory and then do a check, right?

Andy Wingo [:wingo]

Comment 5

•

5 years ago

To add on -- we will eventually need to be able to perform signature checks with a frame already on the stack, in the tail-call case. So it's more consistent if the frame is always present when signature-check traps are raised.

Andy Wingo [:wingo]

Comment 6

•

5 years ago

To expand also on the number of function entry points -- right now only the function is in a position to check its signature. So we need a "call entry" (target of a call instruction) and a "tail entry" (target of a jump), and for each we need a checked and unchecked variant. Before Dmitry's work there are two entries (checked and unchecked call). Dmitry's current patch adds checked and unchecked tail entries.

In the future it would be possible to put the signature in the table associating code pointer, JSFunction*, and TlsData, and arrange to only check signatures in the trampoline -- if that's what we want to do. (After all, anyone that calls direct already knows that the callee has the right type.) That would allow us to reduce to only a call and a tail entry. Not sure if a good optimization or not, though.

WDYT Luke?

Andy Wingo [:wingo]

Comment 7

•

5 years ago

Apologies for the battery of comments -- a tail entry is indeed that: an entry for a tail call. Whether it's a tail call resulting from a WebAssembly tail-call proposal (something we're not working on) or the result of a jump from a trampoline is a detail -- the salient point is that the fixed part of the frame is already on the stack.

Andy Wingo [:wingo]

Comment 8

•

5 years ago

Haha another small apology -- Luke you may be wondering, why implement tail entries when we are not working on tail calls?

The answer is in the design doc:

As an optimization, when control enters an indirect entry trampoline, the trampoline can check if the incoming WasmTlsReg is actually same-instance, and in that case can jump to the type-checking tail entry. In that case there will be no trampoline frame and the callee will return directly to the caller.

So in that case, the trampoline can tail-call the callee, by jumping to the callee's tail entry. But because the trampoline can't currently check the callee's signature, we target the checked tail entry.

AFAIU the unchecked tail entry would only ever be invoked by a direct tail call from the tail calls proposal, so I guess that bit isn't strictly necessary.

Luke Wagner [:luke]

Comment 9

•

5 years ago

Thanks for all the responses and info; understanding that the goal here is to break up a larger patch into smaller independent patches makes total sense. Would it make sense to hold off on landing the patches until the final state is better than the current state? I have a light concern that we get stuck for a while at an in-between state.

Also, "tail call entry" now makes sense; I like the symmetry between trampolines and tail calls :)

The 4 entry points also make sense. Instead of recording dynamic offsets for all of them, would it be possible (now or later) to have static offsets, such that from a pointer to any one entry point you can compute the others? The reason I ask is thinking ahead toward the function-references proposal when you have one function pointer (a funcref or (ref (func ...))) that can be called with a fixed or dynamic signature with a normal or tail call, so it seems like we'd want to have function references point to one canonical entry point and have all the other entry points be static offsets made at the callsite.

(In reply to Dmitry Bezhetskov from comment #4)

An additional question about the final design and the current signature check optimization:
AFAIU a thunk (aka trampoline aka decorator) is a fixed size stub with some connected data and it will be generated independently from the function.

So, we can't bake cmp with immediate into it, as we do know.
We can only load funcTypeId from memory and then do a check, right?

Given a wasm function signature (and no other context), you can statically classify it as having either an immediate representation (where the signature is literally baked into the immediate, which means the immediate is invariant over executions and thus can be (de)serialized without patched) or requires an indirect representation. Thus, if the signature is classified as an immediate, then the trampoline can indeed bake in the cmp-with-immediate. (And if the signature is classified as indirect, then the trampoline can load it from the callee TlsData.)

Flags: needinfo?(luke)

Andy Wingo [:wingo]

Comment 10

•

5 years ago

Good morning! Good questions, Luke :) Point by point:

About patch landing order. Completely agreed that we need to avoid shipping regressions. We operate a bit on instinct here as to what is a regression and what isn't; my impression was that in the planned change series at https://docs.google.com/document/d/1oi6ROZJuE-hBb21Tgq-XcUegPktecF52gADzBJcYL34/edit#heading=h.22ki0jodkf2h, no change would regress in a significant way.

I understand your concern to also be about leaving the tree in an awkward state for a long time, but I think Dmitry is on a roll at this point and when the inline trampoline code lands -- which is the next patch or so -- then we should start to see net performance improvements.

If that were all, I would be inclined to agree with you to hold off until a series can land, but on our side there has been the risk of making patches too big to review, or patches that don't do the right thing, or patches that simply need some style fixes. I think the pressure of getting something good enough to incrementally land has been a real positive thing for progress on this feature and so weighing everything, I would continue to land small steps toward the goal -- as long as we are all generally on board about the goal and the general plan for getting there.

Static offsets would be great! Right now I think that might not be possible because of whether the function has an immediate signature check or not. But yes, I think you are right we can get there.
Immediate signature checks: I think we were under the impression that in the final design, trampoline code would not be signature-dependent -- and so if you were to consider moving signature checks to trampolines, probably you would make all checks load the expected signature from the storage associated with the trampoline instead of having immediate comparisons. I was also unsure if immediate vs load-from-memory was important here -- the branch would surely predict perfectly, and it would "just" be another couple µops in the pipeline per indirect call. Dunno. WDYT here? As I said we have to operate a bit on instincts and ours could be miscalibrated.

Luke Wagner [:luke]

Comment 11

•

5 years ago

Thanks for the explanations!

Makes sense, sgtm
I was thinking that differences of that nature could be covered up by nop'ing up to the desired static offset. But agreed that we don't need to do this now.
Agreed that immediate vs load-from-memory is probably not a huge deal, but just in the interest of not giving it up unnecessarily: how would the trampoline be signature independent? My operating assumption has been that tramplines are statically specialized to the callee (so that they can use a direct call, and also bake in the callee's instance pointer as an immediate), and thus they could know the callee's expected signature as an immediate

Getting back to whether the signature check happens before or after pushing the frame: it makes sense that, for tail calls, the callee frame will have already been pushed, so this should be allowed. However, I think doing so for all call entries will cause extra prologue code to be generated for the two checked entries per function (since they must push their own frame and then jump over the frame-pushing of the unchecked entry). If the signature can be checked before pushing the frame, then the checked entries can check and then fall-through to the frame-pushing of the unchecked entries. IIRC from my measurements, b/c of the large number of small functions in Unity-type apps, prologue-generated machine code is a significant fraction of overall code size. Also icache locality would be somewhat improved. But I forget: are there any concrete problems with checking the signature before pushing the frame?

Andy Wingo [:wingo]

Comment 12

•

5 years ago

Are there any concrete problems with checking the signature before pushing the frame?

There are two nits and one serious problem.

Nit: With tail calls, you have to deal with the signature check trap being raised either with or without the frame.
Nit: If overhead is a problem and tail calls will come, then we should find a solution that allows tail calls without overhead.
Problem: If a function is called from a trampoline, it needs to not set FP and instead use the FP set by the trampoline.

I think this last issue is the motivator to have checked call entries and checked tail entries. WDYT @luke ?

Dmitry Bezhetskov

Assignee

Comment 13

•

5 years ago

Attached file Bug 1639153 - call_indirect pushes trampoline, jumps to checked tail entry r=lth,wingo (obsolete) (deleted) — Details

Dmitry Bezhetskov

Assignee

Comment 14

•

5 years ago

Luke, what is the plan for import calls? They use Frame::tls too, I suppose we are going to handle call_import in the same manner as call_indirect? It means that we are going to call thunk to switch instances.
WDYT?

Flags: needinfo?(luke)

Luke Wagner [:luke]

Comment 15

•

5 years ago

Yeah, good question. I wasn't thinking about import calls b/c they are usually just simplified versions of indirect calls in that you statically know you're calling out of the instance. Thus, if it was easy, it would make sense to statically (at the callsite) do something instead of using a trampoline. That being said, I don't see an easy way to do that in this case so yeah, probably best just to call a trampoline just like indirect calls.

Flags: needinfo?(luke)

Dmitry Bezhetskov

Assignee

Comment 16

•

5 years ago

Attached file Bug 1639153 - Remove using Frame::tls in Jit import exit stub r=lth,wingo (deleted) — Details

Dmitry Bezhetskov

Assignee

Comment 17

•

5 years ago

There are still a few remaining uses of masm.loadWasmTlsRegFromFrame():

CodeGenerator::visitWasmCall
here we reload Tls because Ion assumes that WasmTlsReg is non-volatile.
After implementing trampoline and uses them for call_import and call_indirect we can safely remove this.
MacroAssembler::callWithABI
afaik we set tls here only for wasm BuiltinThunks. A BuiltinThunk uses WasmTlsReg only in the prologue/epilogue, so when we delete Frame::tls this will become obsolete and we can remove it then.

Pulsebot

Comment 18

•

4 years ago

Pushed by malexandru@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/578160fb0ac3 change call_indirect signature check to happen after frame is pushed r=lth,wingo

Bogdan Tara[:bogdan_tara | bogdant]

Comment 19

•

4 years ago

Backed out changeset 578160fb0ac3 (bug 1639153) for WasmFrameIter.cpp related bustage

Push with failure: https://treeherder.mozilla.org/#/jobs?repo=autoland&searchStr=build&fromchange=578160fb0ac35c82a12139b9de86ecf2d791928a&tochange=dac0f3345b5984106adcf22d20e2bceb4a8fb4e6&selectedTaskRun=NWgXbXLXTBa5zNMyhbNQyA-0

Backout link: https://hg.mozilla.org/integration/autoland/rev/dac0f3345b5984106adcf22d20e2bceb4a8fb4e6

Failure log: https://treeherder.mozilla.org/logviewer.html#/jobs?job_id=305426059&repo=autoland&lineNumber=5524

task 2020-06-08T06:29:22.459Z] make[3]: Leaving directory '/builds/worker/workspace/build/src/obj-spider/js/src/irregexp'
[task 2020-06-08T06:29:22.459Z] /builds/worker/fetches/gcc/bin/g++ -std=gnu++17 -o Unified_cpp_js_src_frontend0.o -c  -I/builds/worker/workspace/build/src/obj-spider/dist/system_wrappers -include /builds/worker/workspace/build/src/config/gcc_hidden.h -U_FORTIFY_SOURCE -D_FORTIFY_SOURCE=2 -fstack-protector-strong -DNDEBUG=1 -DTRIMMED=1 -DJS_CACHEIR_SPEW -DJS_STRUCTURED_SPEW -DEXPORT_JS_API -DMOZ_HAS_MOZGLUE -I/builds/worker/workspace/build/src/js/src/frontend -I/builds/worker/workspace/build/src/obj-spider/js/src/frontend -I/builds/worker/workspace/build/src/obj-spider/js/src -I/builds/worker/workspace/build/src/js/src -I/builds/worker/workspace/build/src/obj-spider/dist/include -I/builds/worker/workspace/build/src/obj-spider/dist/include/nspr -fPIC -DMOZILLA_CLIENT -include /builds/worker/workspace/build/src/obj-spider/js/src/js-confdefs.h -Wall -Wempty-body -Wignored-qualifiers -Woverloaded-virtual -Wpointer-arith -Wsign-compare -Wtype-limits -Wunreachable-code -Wwrite-strings -Wno-invalid-offsetof -Wduplicated-cond -Wimplicit-fallthrough -Wunused-function -Wunused-variable -Wno-error=maybe-uninitialized -Wno-error=deprecated-declarations -Wno-error=array-bounds -Wno-error=coverage-mismatch -Wno-error=free-nonheap-object -Wformat -Wformat-overflow=2 -Wno-noexcept-type -D_GLIBCXX_USE_CXX11_ABI=0 -fno-sized-deallocation -fno-aligned-new -fno-rtti -ffunction-sections -fdata-sections -fno-exceptions -fno-math-errno -pthread -pipe -g -freorder-blocks -O3 -fno-omit-frame-pointer -funwind-tables -Werror -fno-strict-aliasing -Werror=format -Wno-shadow -Wno-attributes  -MD -MP -MF .deps/Unified_cpp_js_src_frontend0.o.pp   Unified_cpp_js_src_frontend0.cpp
[task 2020-06-08T06:29:22.459Z] js/src/frontend/Unified_cpp_js_src_frontend1.o
[task 2020-06-08T06:29:23.263Z] /builds/worker/fetches/gcc/bin/g++ -std=gnu++17 -o Unified_cpp_js_src_jit10.o -c  -I/builds/worker/workspace/build/src/obj-spider/dist/system_wrappers -include /builds/worker/workspace/build/src/config/gcc_hidden.h -U_FORTIFY_SOURCE -D_FORTIFY_SOURCE=2 -fstack-protector-strong -DNDEBUG=1 -DTRIMMED=1 -DJS_CACHEIR_SPEW -DJS_STRUCTURED_SPEW -DEXPORT_JS_API -DMOZ_HAS_MOZGLUE -I/builds/worker/workspace/build/src/js/src/jit -I/builds/worker/workspace/build/src/obj-spider/js/src/jit -I/builds/worker/workspace/build/src/obj-spider/js/src -I/builds/worker/workspace/build/src/js/src -I/builds/worker/workspace/build/src/obj-spider/dist/include -I/builds/worker/workspace/build/src/obj-spider/dist/include/nspr -fPIC -DMOZILLA_CLIENT -include /builds/worker/workspace/build/src/obj-spider/js/src/js-confdefs.h -Wall -Wempty-body -Wignored-qualifiers -Woverloaded-virtual -Wpointer-arith -Wsign-compare -Wtype-limits -Wunreachable-code -Wwrite-strings -Wno-invalid-offsetof -Wduplicated-cond -Wimplicit-fallthrough -Wunused-function -Wunused-variable -Wno-error=maybe-uninitialized -Wno-error=deprecated-declarations -Wno-error=array-bounds -Wno-error=coverage-mismatch -Wno-error=free-nonheap-object -Wformat -Wformat-overflow=2 -Wno-noexcept-type -D_GLIBCXX_USE_CXX11_ABI=0 -fno-sized-deallocation -fno-aligned-new -fno-rtti -ffunction-sections -fdata-sections -fno-exceptions -fno-math-errno -pthread -pipe -g -freorder-blocks -O3 -fno-omit-frame-pointer -funwind-tables -Werror -fno-strict-aliasing -Werror=format -Wno-shadow -Wno-attributes  -MD -MP -MF .deps/Unified_cpp_js_src_jit10.o.pp   Unified_cpp_js_src_jit10.cpp
[task 2020-06-08T06:29:23.264Z] js/src/jit/Unified_cpp_js_src_jit11.o
[task 2020-06-08T06:29:23.986Z] /builds/worker/fetches/gcc/bin/g++ -std=gnu++17 -o Unified_cpp_js_src_wasm1.o -c  -I/builds/worker/workspace/build/src/obj-spider/dist/system_wrappers -include /builds/worker/workspace/build/src/config/gcc_hidden.h -U_FORTIFY_SOURCE -D_FORTIFY_SOURCE=2 -fstack-protector-strong -DNDEBUG=1 -DTRIMMED=1 -DJS_CACHEIR_SPEW -DJS_STRUCTURED_SPEW -DEXPORT_JS_API -DMOZ_HAS_MOZGLUE -I/builds/worker/workspace/build/src/js/src/wasm -I/builds/worker/workspace/build/src/obj-spider/js/src/wasm -I/builds/worker/workspace/build/src/obj-spider/js/src -I/builds/worker/workspace/build/src/js/src -I/builds/worker/workspace/build/src/obj-spider/dist/include -I/builds/worker/workspace/build/src/obj-spider/dist/include/nspr -fPIC -DMOZILLA_CLIENT -include /builds/worker/workspace/build/src/obj-spider/js/src/js-confdefs.h -Wall -Wempty-body -Wignored-qualifiers -Woverloaded-virtual -Wpointer-arith -Wsign-compare -Wtype-limits -Wunreachable-code -Wwrite-strings -Wno-invalid-offsetof -Wduplicated-cond -Wimplicit-fallthrough -Wunused-function -Wunused-variable -Wno-error=maybe-uninitialized -Wno-error=deprecated-declarations -Wno-error=array-bounds -Wno-error=coverage-mismatch -Wno-error=free-nonheap-object -Wformat -Wformat-overflow=2 -Wno-noexcept-type -D_GLIBCXX_USE_CXX11_ABI=0 -fno-sized-deallocation -fno-aligned-new -fno-rtti -ffunction-sections -fdata-sections -fno-exceptions -fno-math-errno -pthread -pipe -g -freorder-blocks -O3 -fno-omit-frame-pointer -funwind-tables -Werror -fno-strict-aliasing -Werror=format -Wno-shadow -Wno-attributes  -MD -MP -MF .deps/Unified_cpp_js_src_wasm1.o.pp   Unified_cpp_js_src_wasm1.cpp
[task 2020-06-08T06:29:23.986Z] js/src/wasm/Unified_cpp_js_src_wasm2.o
[task 2020-06-08T06:29:26.656Z] In file included from Unified_cpp_js_src_wasm1.cpp:11:0:
[task 2020-06-08T06:29:26.656Z] /builds/worker/workspace/build/src/js/src/wasm/WasmFrameIter.cpp: In function 'void js::wasm::GenerateFunctionPrologue(js::jit::MacroAssembler&, const js::wasm::FuncTypeIdDesc&, const mozilla::Maybe<unsigned int>&, js::wasm::FuncOffsets*)':
[task 2020-06-08T06:29:26.656Z] /builds/worker/workspace/build/src/js/src/wasm/WasmFrameIter.cpp:581:3: error: static assertion failed: code aligned
[task 2020-06-08T06:29:26.656Z]    static_assert(CheckedTailEntryOffset % CodeAlignment == 0, "code aligned");
[task 2020-06-08T06:29:26.656Z]    ^~~~~~~~~~~~~
[task 2020-06-08T06:29:28.430Z]    Compiling regex v1.3.3
[task 2020-06-08T06:29:28.430Z]      Running `CARGO=/builds/worker/fetches/rustc/bin/cargo CARGO_MANIFEST_DIR=/builds/worker/workspace/build/src/third_party/rust/regex CARGO_PKG_AUTHORS='The Rust Project Developers' CARGO_PKG_DESCRIPTION='An implementation of regular expressions for Rust. This implementation uses
[task 2020-06-08T06:29:28.430Z] finite automata and guarantees linear time matching on all inputs.
[task 2020-06-08T06:29:28.430Z] ' CARGO_PKG_HOMEPAGE='https://github.com/rust-lang/regex' CARGO_PKG_NAME=regex CARGO_PKG_REPOSITORY='https://github.com/rust-lang/regex' CARGO_PKG_VERSION=1.3.3 CARGO_PKG_VERSION_MAJOR=1 CARGO_PKG_VERSION_MINOR=3 CARGO_PKG_VERSION_PATCH=3 CARGO_PKG_VERSION_PRE= LD_LIBRARY_PATH='/builds/worker/workspace/build/src/obj-spider/release/deps:/builds/worker/fetches/rustc/lib:/builds/worker/workspace/build/src/obj-spider/dist/bin:/builds/worker/fetches/gcc/lib64' /builds/worker/fetches/rustc/bin/rustc --crate-name regex /builds/worker/workspace/build/src/third_party/rust/regex/src/lib.rs --error-format=json --json=diagnostic-rendered-ansi,artifacts --crate-type lib --emit=dep-info,metadata,link -C opt-level=2 --cfg 'feature="aho-corasick"' --cfg 'feature="default"' --cfg 'feature="memchr"' --cfg 'feature="perf"' --cfg 'feature="perf-cache"' --cfg 'feature="perf-dfa"' --cfg 'feature="perf-inline"' --cfg 'feature="perf-literal"' --cfg 'feature="std"' --cfg 'feature="thread_local"' --cfg 'feature="unicode"' --cfg 'feature="unicode-age"' --cfg 'feature="unicode-bool"' --cfg 'feature="unicode-case"' --cfg 'feature="unicode-gencat"' --cfg 'feature="unicode-perl"' --cfg 'feature="unicode-script"' --cfg 'feature="unicode-segment"' -C metadata=59cb2151c37e484b -C extra-filename=-59cb2151c37e484b --out-dir /builds/worker/workspace/build/src/obj-spider/release/deps -C linker=/builds/worker/workspace/build/src/build/cargo-linker -L dependency=/builds/worker/workspace/build/src/obj-spider/release/deps --extern aho_corasick=/builds/worker/workspace/build/src/obj-spider/release/deps/libaho_corasick-ae60b24f6684d433.rmeta --extern memchr=/builds/worker/workspace/build/src/obj-spider/release/deps/libmemchr-f1b26a4756e91620.rmeta --extern regex_syntax=/builds/worker/workspace/build/src/obj-spider/release/deps/libregex_syntax-a209697783b4651f.rmeta --extern thread_local=/builds/worker/workspace/build/src/obj-spider/release/deps/libthread_local-df70bf44d84bd469.rmeta --cap-lints warn`
[task 2020-06-08T06:29:30.442Z] /builds/worker/workspace/build/src/config/rules.mk:746: recipe for target 'Unified_cpp_js_src_wasm1.o' failed
[task 2020-06-08T06:29:30.442Z] make[3]: *** [Unified_cpp_js_src_wasm1.o] Error 1
[task 2020-06-08T06:29:30.443Z] make[3]: Leaving directory '/builds/worker/workspace/build/src/obj-spider/js/src/wasm'
[task 2020-06-08T06:29:30.443Z] /builds/worker/workspace/build/src/config/recurse.mk:74: recipe for target 'js/src/wasm/target-objects' failed
[task 2020-06-08T06:29:30.443Z] make[2]: *** [js/src/wasm/target-objects] Error 2
[task 2020-06-08T06:29:30.443Z] make[2]: *** Waiting for unfinished jobs....
[task 2020-06-08T06:29:30.443Z] /builds/worker/fetches/gcc/bin/g++ -std=gnu++17 -o Unified_cpp_js_src_frontend1.o -c  -I/builds/worker/workspace/build/src/obj-spider/dist/system_wrappers -include /builds/worker/workspace/build/src/config/gcc_hidden.h -U_FORTIFY_SOURCE -D_FORTIFY_SOURCE=2 -fstack-protector-strong -DNDEBUG=1 -DTRIMMED=1 -DJS_CACHEIR_SPEW -DJS_STRUCTURED_SPEW -DEXPORT_JS_API -DMOZ_HAS_MOZGLUE -I/builds/worker/workspace/build/src/js/src/frontend -I/builds/worker/workspace/build/src/obj-spider/js/src/frontend -I/builds/worker/workspace/build/src/obj-spider/js/src -I/builds/worker/workspace/build/src/js/src -I/builds/worker/workspace/build/src/obj-spider/dist/include -I/builds/worker/workspace/build/src/obj-spider/dist/include/nspr -fPIC -DMOZILLA_CLIENT -include /builds/worker/workspace/build/src/obj-spider/js/src/js-confdefs.h -Wall -Wempty-body -Wignored-qualifiers -Woverloaded-virtual -Wpointer-arith -Wsign-compare -Wtype-limits -Wunreachable-code -Wwrite-strings -Wno-invalid-offsetof -Wduplicated-cond -Wimplicit-fallthrough -Wunused-function -Wunused-variable -Wno-error=maybe-uninitialized -Wno-error=deprecated-declarations -Wno-error=array-bounds -Wno-error=coverage-mismatch -Wno-error=free-nonheap-object -Wformat -Wformat-overflow=2 -Wno-noexcept-type -D_GLIBCXX_USE_CXX11_ABI=0 -fno-sized-deallocation -fno-aligned-new -fno-rtti -ffunction-sections -fdata-sections -fno-exceptions -fno-math-errno -pthread -pipe -g -freorder-blocks -O3 -fno-omit-frame-pointer -funwind-tables -Werror -fno-strict-aliasing -Werror=format -Wno-shadow -Wno-attributes  -MD -MP -MF .deps/Unified_cpp_js_src_frontend1.o.pp   Unified_cpp_js_src_frontend1.cpp
[task 2020-06-08T06:29:30.443Z] js/src/frontend/Unified_cpp_js_src_frontend2.o
[task 2020-06-08T06:29:31.950Z] In file included from Unified_cpp_js_src_frontend0.cpp:29:0:
[task 2020-06-08T06:29:31.950Z] /builds/worker/workspace/build/src/js/src/frontend/BytecodeEmitter.cpp: In member function 'bool js::frontend::BytecodeEmitter::emitSetOrInitializeDestructuring(js::frontend::ParseNode*, js::frontend::DestructuringFlavor)':
[task 2020-06-08T06:29:31.950Z] /builds/worker/workspace/build/src/js/src/frontend/BytecodeEmitter.cpp:2653:48: warning: 'kind' may be used uninitialized in this function [-Wmaybe-uninitialized]
[task 2020-06-08T06:29:31.950Z]          NameOpEmitter noe(this, name, loc, kind);
[task 2020-06-08T06:29:31.950Z]                                                 ^
[task 2020-06-08T06:29:33.100Z] /builds/worker/workspace/build/src/js/src/frontend/Parser.cpp: In member function 'typename ParseHandler::ClassNodeType js::frontend::GeneralParser<ParseHandler, Unit>::classDefinition(js::frontend::YieldHandling, js::frontend::GeneralParser<ParseHandler, Unit>::ClassContext, js::frontend::DefaultHandling) [with ParseHandler = js::frontend::FullParseHandler; Unit = char16_t]':
[task 2020-06-08T06:29:33.100Z] /builds/worker/workspace/build/src/js/src/frontend/Parser.cpp:7356:16: warning: 'innerName' may be used uninitialized in this function [-Wmaybe-uninitialized]
[task 2020-06-08T06:29:33.100Z]    NameNodeType innerName;
...

Flags: needinfo?(dbezhetskov)

Pulsebot

Comment 20

•

4 years ago

Pushed by btara@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/a4d1e2c3ba9e change call_indirect signature check to happen after frame is pushed r=lth,wingo

Alexandru Michis [:malexandru]

Comment 21

•

4 years ago

Same issue as above, different line: https://treeherder.mozilla.org/logviewer.html#/jobs?job_id=305432836&repo=autoland&lineNumber=5453

Pulsebot

Comment 22

•

4 years ago

Backout by malexandru@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/9bd3381b2697 Backed out changeset a4d1e2c3ba9e for causing bustages in WasmFrameIter.cpp

Pulsebot

Comment 23

•

4 years ago

Pushed by apavel@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/4cf40cf33b0a change call_indirect signature check to happen after frame is pushed r=lth,wingo

Razvan Maries

Comment 24

•

4 years ago

Backed out for build bustages.

Push with failure: https://treeherder.mozilla.org/#/jobs?repo=autoland&resultStatus=testfailed%2Cbusted%2Cexception&revision=4cf40cf33b0aa9aada6fc4896e965f2bd75226d2&selectedTaskRun=GW8mfsIDT4aInJYTd_CbAg-0

Failure log: https://treeherder.mozilla.org/logviewer.html#/jobs?job_id=305595073&repo=autoland&lineNumber=4976

Backout: https://hg.mozilla.org/integration/autoland/rev/7c2f0754a3118ee806ae019bce830012c53c7a47

Pulsebot

Comment 25

•

4 years ago

Pushed by rmaries@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/f1beae5af856 change call_indirect signature check to happen after frame is pushed r=lth,wingo

Dmitry Bezhetskov

Assignee

Comment 26

•

4 years ago

issues with signature moving are resolved and now the changes are landed.
The problem was in --disable-jit configuration, I messed up with MacroAssembler-none.h.

Flags: needinfo?(dbezhetskov)

Noemi Erli[:noemi_erli]

Comment 27

•

4 years ago

bugherder

https://hg.mozilla.org/mozilla-central/rev/f1beae5af856

Status: UNCONFIRMED → RESOLVED

Closed: 4 years ago

status-firefox79: --- → fixed

Resolution: --- → FIXED

Target Milestone: --- → mozilla79

Dmitry Bezhetskov

Assignee

Updated

•

4 years ago

Keywords: leave-open

Dmitry Bezhetskov

Assignee

Updated

•

4 years ago

Status: RESOLVED → UNCONFIRMED

Resolution: FIXED → ---

Dmitry Bezhetskov

Assignee

Comment 28

•

4 years ago

Attached file Bug 1639153 - Refactor wasm::Frame to reduce number of gnarly casts. r=lth,wingo (deleted) — Details

Ryan VanderMeulen [:RyanVM]

Updated

•

4 years ago

Status: UNCONFIRMED → ASSIGNED

status-firefox78: affected → wontfix

status-firefox79: fixed → affected

Ever confirmed: true

Target Milestone: mozilla79 → ---

Pulsebot

Comment 29

•

4 years ago

Pushed by dluca@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/251dbf12af47 Remove using Frame::tls in Jit import exit stub r=lth,wingo

Atila Butkovits

Comment 30

•

4 years ago

bugherder

https://hg.mozilla.org/mozilla-central/rev/251dbf12af47

Pulsebot

Comment 31

•

4 years ago

Pushed by abutkovits@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/f82a44c95497 Refactor wasm::Frame to reduce number of gnarly casts. r=lth,wingo

Atila Butkovits

Comment 32

•

4 years ago

Backed outfor causing bustage at ion-error-ool.js.

Backout link: https://hg.mozilla.org/integration/autoland/rev/c36c8243afb8117b01288cd9f9956d34633d41fd

Push with failures: https://treeherder.mozilla.org/#/jobs?repo=autoland&revision=f82a44c95497c4821a758ba031078945a188bc21&selectedTaskRun=Z09HoE7UROS8aY2B6stcxQ.0

Failure log: https://treeherder.mozilla.org/logviewer.html#/jobs?job_id=306466971&repo=autoland&lineNumber=44620

Flags: needinfo?(dbezhetskov)

Pulsebot

Comment 33

•

4 years ago

Pushed by ncsoregi@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/b620533c6d63 Refactor wasm::Frame to reduce number of gnarly casts. r=lth,wingo

Noemi Erli[:noemi_erli]

Comment 34

•

4 years ago

Backed out changeset b620533c6d63 (Bug 1639153) for causing build bustages in WasmFrameIter.cpp CLOSED TREE

Push with failures: https://treeherder.mozilla.org/#/jobs?repo=autoland&group_state=expanded&selectedTaskRun=UZwtk93vTmKPwDc8wqFLag.0&resultStatus=testfailed%2Cbusted%2Cexception&revision=b620533c6d633ef50c5e5cb3ecd0f398206eb271

Failure log: https://treeherder.mozilla.org/logviewer.html#/jobs?job_id=306641550&repo=autoland&lineNumber=4845

Backout: https://hg.mozilla.org/integration/autoland/rev/f2a097374fad969ca31468e65b0839108baca5ff

Pulsebot

Comment 35

•

4 years ago

Pushed by csabou@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/caa8552f4f52 Refactor wasm::Frame to reduce number of gnarly casts. r=lth,wingo

Dmitry Bezhetskov

Assignee

Comment 36

•

4 years ago

fixed and pushed updated changes

Flags: needinfo?(dbezhetskov)

Atila Butkovits

Comment 37

•

4 years ago

bugherder

https://hg.mozilla.org/mozilla-central/rev/caa8552f4f52

Dmitry Bezhetskov

Assignee

Comment 38

•

4 years ago

Attached file Bug 1639153 - Interpreter entry pushes trampoline frame to preserve tls for runtime r=lth (obsolete) (deleted) — Details

Phabricator Automation

Updated

•

4 years ago

Attachment #9157867 - Attachment description: Bug 1639153 - interpreter entry pushes trampoline frame to preserve tls for runtime r=lth → Bug 1639153 - Interpreter entry pushes trampoline frame to preserve tls for runtime r=lth

Zhao Jiazhong

Comment 39

•

4 years ago

(In reply to Pulsebot from comment #33)

Pushed by ncsoregi@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/b620533c6d63
Refactor wasm::Frame to reduce number of gnarly casts. r=lth,wingo

Hi, it seems this change caused build failure on mips platform, is it ok that I submit the fix patch to this bug? Or should I file a new bug? Thanks!

Dmitry Bezhetskov

Assignee

Comment 40

•

4 years ago

Hu Zhao, I think it is OK to submit patches here if the number of patches is small.
BTW, I wonder that we support mips platform at all.

Zhao Jiazhong

Comment 41

•

4 years ago

Attached file Bug 1639153 - [MIPS] Fix a recent refactoring to wasm::Frame on mips. r=lth (deleted) — Details

Lars T Hansen [:lth]

Comment 42

•

4 years ago

(In reply to Dmitry Bezhetskov from comment #40)

BTW, I wonder that we support mips platform at all.

We do have MIPS support but it's a tier-3 platform, supported not by Mozilla but by external contributors (most of them at Longsoon). There are MIPS simulators in the tree in the same way as arm/arm64, and sometimes we try to keep the MIPS code working by running them, but usually we just change things as we want and wait for the MIPS support team to catch up.

Pulsebot

Comment 43

•

4 years ago

Pushed by ncsoregi@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/5078030eaeaf [MIPS] Fix a recent refactoring to wasm::Frame on mips. r=lth

Natalia Csoregi [:nataliaCs]

Comment 44

•

4 years ago

bugherder

https://hg.mozilla.org/mozilla-central/rev/5078030eaeaf

Phabricator Automation

Updated

•

4 years ago

Attachment #9157867 - Attachment is obsolete: true

Lars T Hansen [:lth]

Comment 45

•

4 years ago

Attached file call_indirect_ubench.js (obsolete) (deleted) — Details

Microbenchmark for call_indirect. On my Xeon workstation, I get these numbers with yesterday's mozilla-central (JS shell, configure with --disable-debug --enable-optimize --enable-release, run with --wasm-compiler=ion):

external: 388.2451171875
internal: 333.218017578125

This suggests that in the code that has landed so far, indirect calls to module-internal functions are about 14% faster than indirect calls to imported functions.

Lars T Hansen [:lth]

Comment 46

•

4 years ago

Attached file call_indirect_ubench.js (deleted) — Details

Updated to add direct calls as a comparison. These come in at 187ms roughly, ie, at about 56% of the module-internal indirect calls.

Attachment #9159321 - Attachment is obsolete: true

Dmitry Bezhetskov

Assignee

Comment 47

•

4 years ago

For my amd machine it is:

branches/default/tip Fri Jun 26:

external: 256.566162109375
internal: 254.14599609375
direct: 186.510986328125

branches/default/tip Fri Jun 26 + additional trampoline frame:

external: 420.4189453125
internal: 279.97607421875
direct: 186.052978515625

Direct calls aren't affected.
Internal: 9% slowdown. It's because we add additional "if" there, before we just load/reload tls from the frame.
External: 64% slowdown. Pushing additional trampoline frame + if + additional call/return into/from trampoline.

If we didn't push Frame::tls we would save:
2 push for internals, one for the function's frame and one for the trampoline
3 push for external

Lars T Hansen [:lth]

Updated

•

4 years ago

See Also: → https://bugzilla.mozilla.org/show_bug.cgi?id=1649109

Dmitry Bezhetskov

Assignee

Comment 48

•

4 years ago

Luke, if we had the same approach with trampolines for import calls it would that they will slower than now. As you say there isn't an easy way to make a workaround for import calls because we need to somehow signalize for frame iteration that instance is changed.

Actually, I can say that there will be the same slowdown as for cross-instance case for call_indirect. It is about 50% slowdown compared to the previous version where we just WasmTlsReg <- callee's tls and call callee.code.
The current plan is good, but it has also such consequences, lemme show trade-offs for current plan:

very fast direct calls, actually no overhead at all
a little bit more code, about 4 bytes per function for nops
import calls and cross-instance call_indirect become slow. We add constant overhead for this, additional call/return + push trampoline's data
trampoline doesn't need to copy stack args because we address it from FP.

I don't have a whole picture of wasm in SpiderMonkey, but I think you have. So, these tradeoffs are ok for us or not? I'm asking because saw the web IDL binding proposal and it looks like we are going to use import calls to call WebIDL.

There is another tradeoff we can do. We can fuse your original idea with the current plan.
Let me describe this a little bit more:
Your original idea was to add slots after stack arguments and then track tls via CallSiteDesc.
I've investigated this earlier and I can say that it is doable if the number of slots is equal 2. (I have abandoned PR for this).
One for calleeTls and one other one for callerTls. We need to preserve it only on possible cross-instance calls.
The current plan with addressing from FP and trampolines allows us don't copy arguments when we call a trampoline.
So, with this fusion we have:

a little bit slowdown for all calls included direct calls because we always allocate two slots after stack args, even if there is no any.
import calls don't need to call a trampoline because runtime already knows about this transition.
the amount of memory by wasm function is increased by 2 slots for each call.
trampolines already have slots for callerTls and calleeTls so it needs only to prepare a frame for the callee and jump into it.
trampoline doesn't need to copy stack args because we address it from FP.

You may ask, what is actually slowdown for direct calls because they are very frequent in wasm demos.
I've added to slots for all calls and measured on the benchmark (provided by Lars Hansen). There we call the function with no stack arguments so it's the most frequent cause for zengarden and other benchmarks.

I've run this test many times and calculate the average. The test machine is my hp laptop with ubuntu 18.04 and inter core i7 on board.
Base revision is (5c9a4697057bd5ef8e15e6223e049674107ca433 Wed Jul 1 Bug 1147091 - Add StableSort to nsTArray_Impl. r=froydnj Simon Giesecke <sgiesecke@mozilla.com>). Flags: --disable-debug --enable-optimize --enable-release, run with --wasm-compiler=ion

with two additional slots:
external: 267.17
internal: 254.44
direct: 140.26

reference:
external: 269.44
internal: 260.42
direct: 145.84

It looks strange but this is what we have. I compared many times but it looks like there no overhead for all calls and even some improvements..
The amount of memory is increased, of course, it is about 2 slots for each call (depends on the alignment).

So, wdyt about these two cases?
First one is ok and constant overhead doesn't scare us
or we increase each call by two slots and have cheap import calls and cheap trampolines?

Flags: needinfo?(luke)

Flags: needinfo?(lth)

Lars T Hansen [:lth]

Updated

•

4 years ago

Flags: needinfo?(lth) → needinfo?(lhansen)

Dmitry Bezhetskov

Assignee

Comment 49

•

4 years ago

One possible disadvantage of the second approach is that for same-instance call_indirect we need to preserve tls because for runtime we can't distinguish between import call and call_indirect they both have kind == dynamic, but it's just two store instruction.

Lars T Hansen [:lth]

Comment 50

•

4 years ago

I think slowing down cross-instance indirect and import calls is not good, so anything we can do to mitigate that is worth thinking hard about.

I don't think "adding two extra slots" should cost anything in the no-stack-args case if we do things right. Ion creates the stack frame with enough space for all outgoing arguments on function entry, so there should be no reason to make any adjustment of the SP just prior to the call - the adjustment should be made on function entry when we create the local frame. I thought that there might be an added small cost in functions that make zero-stack-arg calls and that themselves do not currently need to allocate any stack memory, like int f(int x) { return g(x)+1 }, but currently we're allocating a frame (8 bytes unused memory) for those already to ensure proper stack alignment in the callee, so there would be no regression there either.

I'm a little more worried about code size, frankly, because programs are becoming large quickly. In the case of my function f above I see 14 bytes of NOPs on x64 currently:

00000002  55                        push %rbp
00000003  48 8b ec                  mov %rsp, %rbp
00000006  0f 1f 00                  nopl %eax, (%rax)
00000009  0f 1f 80 00 00 00 00      nopl %eax, (%rax)
00000010  41 83 fa 23               cmp $0x23, %r10d
00000014  0f 84 0c 00 00 00         jz 0x0000000000000026
0000001A  0f 0b                     ud2
0000001C  0f 1f 40 00               nopl %eax, (%rax)
00000020  41 56                     push %r14
00000022  55                        push %rbp
00000023  48 8b ec                  mov %rsp, %rbp
00000026  48 83 ec 08               sub $0x08, %rsp
...

Flags: needinfo?(lhansen)

Luke Wagner [:luke]

Comment 51

•

4 years ago

Just to check: when you say that all calls would have 2 extra slots (for callerTLS/calleeTLS), are those slots always allocated, but only initialized in the cross-instance cases by the trampolines? If so, then I agree it should be practically free so that seems like a good idea.

Flags: needinfo?(luke)

Dmitry Bezhetskov

Assignee

Comment 52

•

4 years ago

(In reply to Luke Wagner [:luke] from comment #51)

Just to check: when you say that all calls would have 2 extra slots (for callerTLS/calleeTLS), are those slots always allocated, but only initialized in the cross-instance cases by the trampolines? If so, then I agree it should be practically free so that seems like a good idea.

We need to fill these slots for indirect calls. Imports calls will always store tls and calls via trampolines will fill them according to the trampoline logic.

Benjamin Bouvier [:bbouvier] (inactive)

Updated

•

4 years ago

Regressions: 1651468

Dmitry Bezhetskov

Assignee

Comment 53

•

4 years ago

Attached file Bug 1639153 - Part 1: Reserve two slots after stack arguments for the future tls preservation. r=lth (deleted) — Details

We are going to remove Frame::tls and support trampolines for indirect calls, so we need to get rid of using Frame::tls.
In this and the followup patches I will iteratively remove all dependencies of Frame::tls and remove it eventually.

In this patch I changed wasm ABI to allocate two stack slots after stack args to preserve caller's and callee's tls'es in the near future.

Dmitry Bezhetskov

Assignee

Comment 54

•

4 years ago

Attached file Bug 1639153 - Part 2: Preserve callee and caller tls'es. r=lth (deleted) — Details

This is a followup patch for removing Frame::tls.
Now, we are preserving caller's and callee's tls'es for all possible cross-instance calls in the previously allocated abi slots.
We will use them in the near future to untie c++ runtime from Frame::tls and to reload WasmTlsReg after a cross-instance call.

Luke Wagner [:luke]

Comment 55

•

4 years ago

I'm afraid I'm about to go on vacation for a week, and probably at this point in time, Lars needs to be the better reviewer. I think he's also on vacation for I think the next two weeks, so I'm sorry about the delay.

Dmitry Bezhetskov

Assignee

Comment 56

•

4 years ago

Luke, thanks for the notice.
If you could review patches after the vacation it would be awesome. If you can't it is well, ok, /sigh/ for me to wait for Lars. In the last case I'll change you on Lars after hi returns from vacation.

BTW, patches are very simple and extremely short.

Dmitry Bezhetskov

Assignee

Comment 57

•

4 years ago

Attached file Bug 1639153 - Part 3: Implement the algorithm for obtaining tls and use it for wasm signal handling. r=lth (deleted) — Details

This is the third part of series of patches to Frame without tls pointer.
Here we preserve initial tls in all entry stubs and then use it to find a proper tls instance for a given frame.

To find the TlsData* for specific frame we start from a entry stub's tls
and then track tls through all possible cross-instance calls. This logic
is implemented in GetNearestEffectiveTls procedure.

Then, we use this new procedure to make singal handling free from Frame::tls.

Dmitry Bezhetskov

Assignee

Comment 58

•

4 years ago

Attached file Bug 1639153 - Part 4: Untie frame iteration from Frame::tls. r=lth (deleted) — Details

Here we replace usage of Frame::tls in frame iteration with GetNearestEffectiveTls.
We also maintain current tls for frame iteration object to not to call GetNearestEffectiveTls everytime.

Dmitry Bezhetskov

Assignee

Comment 59

•

4 years ago

Attached file Bug 1639153 - Part 5: Remove remaining uses of Frame::tls. r=lth (deleted) — Details

Here we remove remaining uses of Frame::tls. There are many places where
we use it: in debug frames, in profiling frames, in jit activation, etc.
All these places require short fixes to use our new scheme for getting
tls, so I gathered them together.

Dmitry Bezhetskov

Assignee

Comment 60

•

4 years ago

Attached file Bug 1639153 - Part 6.0: Force Ion to preserve its tls. r=lth (deleted) — Details

Let's see following code and let's assume that wasm-compiler is ion:

call foo
call bar

Ion can clobber tls inside foo and then just go with clobbered tls into bar.
It will crash if bar uses tls. At compile-time we don't know whether bar will use tls or not.

It works well when we restore tls each time when we return from a function because of the current frame structure.
But now, when we want to get rid of that Frame::tls we should guarantee that Ion doesn't clobber tls inside a function.
In fact we forces Ion to restore tls iff it clobbers it and it is a rare case.

Baseline doesn't need such logic because of its private stack slot for tls.

Dmitry Bezhetskov

Assignee

Comment 61

•

4 years ago

Attached file Bug 1639153 - Part 7: Remove Frame::tls. r=lth (deleted) — Details

We remove Frame::tls and fix prologue/epilogue. Runtime, Ion and Baseline are ready for this move already.

Dmitry Bezhetskov

Assignee

Comment 62

•

4 years ago

When I did measurement I suddenly discovered that there is some small performance regression at patch 7.
My machine is not representative at all, but it is just a little bit mysterious.
At 6 part performance of direct calls: 185
At 7 part it is: 193

I haven't seen any major performance regression at patch 7.

I started investigation, compared the codes and discovered that SM debug (--enable-debug --disable-optimize) generates code which executes faster than SM release (--disable-debug --enable-optimize --enable-release).
At 7 part:
Debug: 183
Release: 195

Debug code:

   0x2802daadd2b0      push   %rbp
   0x2802daadd2b1      mov    %rsp,%rbp
   0x2802daadd2b4      sub    $0x20,%rsp
   0x2802daadd2b8      cmp    %rsp,0x30(%r14)
   0x2802daadd2bc      jb     0x2802daadd2c4
   0x2802daadd2c2      ud2
   0x2802daadd2c4      mov    %edi,0x1c(%rsp)
   0x2802daadd2c8      xor    %eax,%eax
   0x2802daadd2ca      mov    %eax,0x18(%rsp)
   0x2802daadd2ce      xchg   %ax,%ax
   0x2802daadd2d0      cmpl   $0x0,0x38(%r14)
   0x2802daadd2d5      je     0x2802daadd2dd
   0x2802daadd2db      ud2
   0x2802daadd2dd      mov    0x1c(%rsp),%eax
   0x2802daadd2e1      test   %eax,%eax
   0x2802daadd2e3      je     0x2802daadd311
   0x2802daadd2e9      sub    $0x1,%eax
   0x2802daadd2ec      mov    %eax,0x1c(%rsp)
   0x2802daadd2f0      mov    $0x7,%edi
   0x2802daadd2f5      test   $0xf,%spl
   0x2802daadd2f9      je     0x2802daadd300
   0x2802daadd2ff      int3
   0x2802daadd300      callq  0x2802daadd0a0
   0x2802daadd305      mov    %eax,%eax
   0x2802daadd307      add    0x18(%rsp),%eax
   0x2802daadd30b      mov    %eax,0x18(%rsp)
   0x2802daadd30f      jmp    0x2802daadd2ce
   0x2802daadd311      mov    0x18(%rsp),%eax
   0x2802daadd315      add    $0x20,%rsp
   0x2802daadd319      pop    %rbp
   0x2802daadd31a      retq

Release code:

    0x1256f09c4290      push   %rbp
    0x1256f09c4291      mov    %rsp,%rbp
    0x1256f09c4294      sub    $0x20,%rsp
    0x1256f09c4298      cmp    %rsp,0x30(%r14)
    0x1256f09c429c      jb     0x1256f09c42a4 
    0x1256f09c42a2      ud2
    0x1256f09c42a4      mov    %edi,0x1c(%rsp)
    0x1256f09c42a8      xor    %eax,%eax
    0x1256f09c42aa      mov    %eax,0x18(%rsp)
    0x1256f09c42ae      xchg   %ax,%ax
    0x1256f09c42b0      cmpl   $0x0,0x38(%r14)
    0x1256f09c42b5      je     0x1256f09c42bd 
    0x1256f09c42bb      ud2
    0x1256f09c42bd      mov    0x1c(%rsp),%eax
    0x1256f09c42c1      test   %eax,%eax
    0x1256f09c42c3      je     0x1256f09c42e6 
    0x1256f09c42c9      sub    $0x1,%eax
    0x1256f09c42cc      mov    %eax,0x1c(%rsp)
    0x1256f09c42d0      mov    $0x7,%edi
    0x1256f09c42d5      callq  0x1256f09c40a0 
    0x1256f09c42da      mov    %eax,%eax
    0x1256f09c42dc      add    0x18(%rsp),%eax
    0x1256f09c42e0      mov    %eax,0x18(%rsp)
    0x1256f09c42e4      jmp    0x1256f09c42ae 
    0x1256f09c42e6      mov    0x18(%rsp),%eax
    0x1256f09c42ea      add    $0x20,%rsp
    0x1256f09c42ee      pop    %rbp           
    0x1256f09c42ef      retq

The only difference which I can see is:

test   $0xf,%spl
je     0x2802daadd300
int3

Looks like inserting this dynamic check is speeding up the code.
Looks like an alignment problem.

I'll check the SM behavior on my laptop and compare results.

Dmitry Bezhetskov

Assignee

Comment 63

•

4 years ago

If I increase the loop body:

  (func (export "run_direct") (param $count i32) (result i32)
    (local $sum i32)
    (block $END
      (loop $LOOP
        (br_if $END (i32.eqz (local.get $count)))
        (local.set $count (i32.sub (local.get $count) (i32.const 1)))
        (local.set $sum
          (i32.add (call $internal (i32.const 7))
                                   (local.get $sum)))
        (local.set $sum
          (i32.add (call $internal (i32.const 7))
                                   (local.get $sum)))
        (br $LOOP)))
    (local.get $sum))

then difference between the reference and part7 PR is very small 254.25 vs 255 so I think the previous anomaly affects only very small loops.

Dmitry Bezhetskov

Assignee

Comment 64

•

4 years ago

I also compared reference and my stack of patches on https://github.com/lars-t-hansen/embenchen in release mode with --wasm-compiler=ion
The results (lower score is better):

The first column is score for the reference and the second column is for the my stack of patches.
Looks like everything is good!

box2d:              700  691
bullet:             895  903
conditionals:       213  214
copy:               463  465
corrections:        1356 1356
fannkuch:           2334 2257
fasta:              616  616
fib:                639  612
ifs:                197  190
binarytrees:        3235 3234
memops:             979  976
primes:             1451 1454
raybench:           1430 1444
rust-fannkuch:      2189 2192
skinning:           541  539
zlib:               987  988

I also compared external and internal calls with the previous patched benchmark:

Reference:

external: 257.699951171875
internal: 254.968017578125
direct: 258.489990234375

My stack of patches:

external: 258.5400390625
internal: 234.716064453125
direct: 256.7490234375

To sum up, I can conclude that direct call aren't affected by modulo the first test, internal calls became even faster, and external calls aren't affected, so removing Frame::tls is a good independent step, imho.

Phabricator Automation

Updated

•

4 years ago

Attachment #9162440 - Attachment description: Bug 1639153 - Part 1: Reserve two slots after stack arguments for the future tls preservation. r=luke → Bug 1639153 - Part 1: Reserve two slots after stack arguments for the future tls preservation. r=lth

Phabricator Automation

Updated

•

4 years ago

Attachment #9162453 - Attachment description: Bug 1639153 - Part 2: Preserve callee and caller tls'es. r=luke → Bug 1639153 - Part 2: Preserve callee and caller tls'es. r=lth

Pulsebot

Comment 65

•

4 years ago

Pushed by csabou@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/f68854b70d6a Part 1: Reserve two slots after stack arguments for the future tls preservation. r=lth https://hg.mozilla.org/integration/autoland/rev/296b7f5a0713 Part 2: Preserve callee and caller tls'es. r=lth

Natalia Csoregi [:nataliaCs]

Comment 66

•

4 years ago

Backed out for bustage on MacroAssembler.h

backout: https://hg.mozilla.org/integration/autoland/rev/eeed6471b7255c0d93d96ff2a202567256ac5a30

push: https://treeherder.mozilla.org/#/jobs?repo=autoland&selectedTaskRun=Hqmhidc8RPeFuoWEjJPfPA.0&revision=296b7f5a07133f9cbbd28ab20c1468800b0e1d39

failure log: https://treeherder.mozilla.org/logviewer.html#/jobs?job_id=312226941&repo=autoland&lineNumber=21791

[task 2020-08-06T06:29:23.793Z] 06:29:23 INFO - In file included from /builds/worker/checkouts/gecko/js/src/jit/x86/SharedICRegisters-x86.h:10:
[task 2020-08-06T06:29:23.793Z] 06:29:23 INFO - /builds/worker/checkouts/gecko/js/src/jit/MacroAssembler.h(4202,5): error: use of undeclared identifier 'increaseStackOffset'
[task 2020-08-06T06:29:23.793Z] 06:29:23 INFO - increaseStackOffset(wasm::FrameWithTls::sizeWithoutFrame());
[task 2020-08-06T06:29:23.793Z] 06:29:23 INFO - ^
[task 2020-08-06T06:29:23.793Z] 06:29:23 INFO - 1 error generated.
[task 2020-08-06T06:29:23.793Z] 06:29:23 ERROR - make[4]: *** [/builds/worker/checkouts/gecko/config/rules.mk:748: Parser.obj] Error 1
[task 2020-08-06T06:29:23.793Z] 06:29:23 INFO - make[4]: Leaving directory '/builds/worker/workspace/obj-build/js/src/frontend'
[task 2020-08-06T06:29:23.793Z] 06:29:23 ERROR - make[3]: *** [/builds/worker/checkouts/gecko/config/recurse.mk:72: js/src/frontend/target-objects] Error 2
[task 2020-08-06T06:29:23.793Z] 06:29:23 INFO - make[3]: *** Waiting for unfinished jobs....

Flags: needinfo?(dbezhetskov)

Phabricator Automation

Updated

•

4 years ago

Attachment #9162715 - Attachment description: Bug 1639153 - Part 3: Implement the algorithm for obtaining tls and use it for wasm signal handling. r=luke → Bug 1639153 - Part 3: Implement the algorithm for obtaining tls and use it for wasm signal handling. r=lth

Pulsebot

Comment 67

•

4 years ago

Pushed by csabou@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/95d6c42f07fd Part 1: Reserve two slots after stack arguments for the future tls preservation. r=lth https://hg.mozilla.org/integration/autoland/rev/25931ec43a32 Part 2: Preserve callee and caller tls'es. r=lth https://hg.mozilla.org/integration/autoland/rev/30af0e8c7956 Part 3: Implement the algorithm for obtaining tls and use it for wasm signal handling. r=lth

Phabricator Automation

Updated

•

4 years ago

Attachment #9162723 - Attachment description: Bug 1639153 - Part 4: Untie frame iteration from Frame::tls. r=luke → Bug 1639153 - Part 4: Untie frame iteration from Frame::tls. r=lth

Phabricator Automation

Updated

•

4 years ago

Attachment #9162736 - Attachment description: Bug 1639153 - Part 5: Remove remaining uses of Frame::tls. r=luke → Bug 1639153 - Part 5: Remove remaining uses of Frame::tls. r=lth

Phabricator Automation

Updated

•

4 years ago

Attachment #9162746 - Attachment description: Bug 1639153 - Part 6: Force Ion to preserve its tls. r=luke → Bug 1639153 - Part 6: Force Ion to preserve its tls. r=lth

Natalia Csoregi [:nataliaCs]

Comment 68

•

4 years ago

bugherder

https://hg.mozilla.org/mozilla-central/rev/95d6c42f07fd
https://hg.mozilla.org/mozilla-central/rev/25931ec43a32
https://hg.mozilla.org/mozilla-central/rev/30af0e8c7956

Chris Fallin [:cfallin]

Updated

•

4 years ago

Regressions: 1649928

Gary Kwong [:gkw] [:nth10sd] (NOT official MoCo now)

Updated

•

4 years ago

Regressions: 1657812

Lars T Hansen [:lth]

Updated

•

4 years ago

Regressions: 1657815

Lars T Hansen [:lth]

Comment 69

•

4 years ago

Unverified but plausible regression: bug 1657741.

Bogdan Tara[:bogdan_tara | bogdant]

Comment 70

•

4 years ago

Backed out 3 changesets (bug 1639153) on lth's request

Backout link: https://hg.mozilla.org/mozilla-central/rev/24fc091bbbf6f2cf776618c1dc67f3d77cbb1feb

Sebastian Hengst [:aryx] (needinfo me if it's about an intermittent or backout)

Updated

•

4 years ago

Regressions: 1657741

Sebastian Hengst [:aryx] (needinfo me if it's about an intermittent or backout)

Updated

•

4 years ago

Regressions: 1657692

Sebastian Hengst [:aryx] (needinfo me if it's about an intermittent or backout)

Updated

•

4 years ago

Regressions: 1657686

Bogdan Tara[:bogdan_tara | bogdant]

Updated

•

4 years ago

Regressions: 1657730

Markus Stange [:mstange]

Updated

•

4 years ago

Regressions: 1657825

Markus Stange [:mstange]

Updated

•

4 years ago

Regressions: 1657917

Ryan VanderMeulen [:RyanVM]

Updated

•

4 years ago

status-firefox79: affected → wontfix

Dmitry Bezhetskov

Assignee

Comment 71

•

4 years ago

Attached file Bug 1639153 - Part 6.1: Untie callWithAbi from Frame::tls for Baseline. r=lth (deleted) — Details

We have a tls dependency in callWithABI for wasm because we call builtins via BuiltinThunk.
Last one requires that tls to be loaded. In this patch I gradually extend callWithABI interface to
pass offset to tls. This allows us to preserve tls and load it by offset in callWithABI and frees us from
the Frame::tls dependency.

Dmitry Bezhetskov

Assignee

Comment 72

•

4 years ago

Attached file Bug 1639153 - Part 6.2: Establish dependency from tls for x86 callWithABI div/mod i64. r=lth (deleted) — Details

x86 has few register so to do div/mod for i64 it call the runtime
and clobber almost all gp registers including WasmTlsReg.
To be able to call c++ runtime via Builtin thunk we need to set up WasmTlsReg.
In this patch I create dependencies from MIR level to Codegen to be sure that WasmTlsReg is alive
when we call runtime.

Dmitry Bezhetskov

Assignee

Comment 73

•

4 years ago

Attached file Bug 1639153 - Part 6.3: Establish dependency from tls for arm callWithABI div/mod i64. r=lth (deleted) — Details

To be able to call c++ runtime via Builtin thunk we need to set up WasmTlsReg.
In this patch I create dependencies from MIR level to Codegen to be sure that WasmTlsReg is alive
when we call runtime in div/mod i64 for arm.

Dmitry Bezhetskov

Assignee

Comment 74

•

4 years ago

Attached file Bug 1639153 - Part 6.4: Add tls dependency for WasmTruncateToInt64 and Int64ToFloatingPoint for arm. r=lth (deleted) — Details

In this patch we add a tls dependency for the remaining nodes which use
BuiltinThunk to call c++ runtime. By ABI requirements WasmTlsReg should
be set.

Dmitry Bezhetskov

Assignee

Comment 75

•

4 years ago

Attached file Bug 1639153 - Part 6.5: Add tls dependency for WasmModD. r=lth (deleted) — Details

We generate builtin call for Mod operation for Double types, so we need
to add a tls dependency. In this patch I've added it.

Dmitry Bezhetskov

Assignee

Comment 76

•

4 years ago

Attached file Bug 1639153 - Part 6.6: Add tls dependency for truncate i32. r=lth (deleted) — Details

We generate builtin call for MTruncateToInt32 operation for floating points types,
so we need to add a tls dependency.
I inserted NYI for arm64 because Ion doesn't support arm64.

Pulsebot

Comment 77

•

4 years ago

Pushed by malexandru@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/f1af63fb5986 Part 6: Force Ion to preserve its tls. r=lth

Pulsebot

Comment 78

•

4 years ago

Pushed by ccoroiu@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/93f3b8aa5bc5 Part 6.1: Untie callWithAbi from Frame::tls for Baseline. r=lth

Dorel Luca [:dluca]

Comment 79

•

4 years ago

bugherder

https://hg.mozilla.org/mozilla-central/rev/f1af63fb5986
https://hg.mozilla.org/mozilla-central/rev/93f3b8aa5bc5

Pulsebot

Comment 80

•

4 years ago

Pushed by abutkovits@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/eb35fdc50799 Part 6.2: Establish dependency from tls for x86 callWithABI div/mod i64. r=lth

Atila Butkovits

Comment 81

•

4 years ago

Backed outfor bustages at Lowering.cpp.

Backout link: https://hg.mozilla.org/integration/autoland/rev/f0551b411d485a75e1f60ac837768d9cf59ef91e

Push with failures: https://treeherder.mozilla.org/#/jobs?repo=autoland&selectedTaskRun=aZP4vkLXRi-mu-bXlaRt7Q.0&resultStatus=testfailed%2Cbusted%2Cexception&revision=eb35fdc507996da61eaceffeb9f9e9b12f3ba0ca&searchStr=spidermonkey

Failure log 1: https://treeherder.mozilla.org/logviewer.html#/jobs?job_id=315285369&repo=autoland&lineNumber=3754

Failure log 2: https://treeherder.mozilla.org/logviewer.html#/jobs?job_id=315285384&repo=autoland&lineNumber=28767

Pulsebot

Comment 82

•

4 years ago

Pushed by btara@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/fd8adea7aa17 Part 6.2: Establish dependency from tls for x86 callWithABI div/mod i64. r=lth https://hg.mozilla.org/integration/autoland/rev/9b42e0b690b6 Part 6.3: Establish dependency from tls for arm callWithABI div/mod i64. r=lth https://hg.mozilla.org/integration/autoland/rev/69a27bde5003 Part 6.4: Add tls dependency for WasmTruncateToInt64 and Int64ToFloatingPoint for arm. r=lth https://hg.mozilla.org/integration/autoland/rev/a309060018a8 Part 6.5: Add tls dependency for WasmModD. r=lth https://hg.mozilla.org/integration/autoland/rev/1e582b17eab7 Part 6.6: Add tls dependency for truncate i32. r=lth

Bogdan Tara[:bogdan_tara | bogdant]

Comment 83

•

4 years ago

Backed out 5 changesets (bug 1639153) for bustages complaining about Lowering.cpp.

Push with failures: https://treeherder.mozilla.org/#/jobs?repo=autoland&group_state=expanded&selectedTaskRun=dcOWhtqBQ_22acexbHEqgw.0&fromchange=6d91f5b4cc322020443d2789407a3fe0ad50091b&searchStr=build&tochange=3010f73c9eec545c42715659f3593516ef686f70

Backout link: https://hg.mozilla.org/integration/autoland/rev/3010f73c9eec545c42715659f3593516ef686f70

Failure log: https://treeherder.mozilla.org/logviewer.html#/jobs?job_id=315409723&repo=autoland&lineNumber=33796

[task 2020-09-11T05:59:11.889Z] 05:59:11     INFO -  make[4]: Entering directory '/builds/worker/workspace/obj-build/js/src/jit'
[task 2020-09-11T05:59:11.896Z] 05:59:11     INFO -  /builds/worker/fetches/sccache/sccache /builds/worker/fetches/clang/bin/clang++ -std=gnu++17 -o Unified_cpp_js_src_jit8.o -c  -I/builds/worker/workspace/obj-build/dist/system_wrappers -include /builds/worker/checkouts/gecko/config/gcc_hidden.h -U_FORTIFY_SOURCE -DNDEBUG=1 -DTRIMMED=1 -DWASM_SUPPORTS_HUGE_MEMORY -DJS_CACHEIR_SPEW -DJS_STRUCTURED_SPEW -DJS_HAS_CTYPES -DFFI_BUILDING -DEXPORT_JS_API -DMOZ_HAS_MOZGLUE -I/builds/worker/checkouts/gecko/js/src/jit -I/builds/worker/workspace/obj-build/js/src/jit -I/builds/worker/workspace/obj-build/js/src -I/builds/worker/checkouts/gecko/js/src -I/builds/worker/workspace/obj-build/dist/include -I/builds/worker/workspace/obj-build/dist/include/nspr -fPIC -DMOZILLA_CLIENT -include /builds/worker/workspace/obj-build/js/src/js-confdefs.h -Qunused-arguments -Qunused-arguments -Wall -Wbitfield-enum-conversion -Wempty-body -Wignored-qualifiers -Woverloaded-virtual -Wpointer-arith -Wshadow-field-in-constructor-modified -Wsign-compare -Wtype-limits -Wunreachable-code -Wunreachable-code-return -Wwrite-strings -Wno-invalid-offsetof -Wclass-varargs -Wempty-init-stmt -Wfloat-overflow-conversion -Wfloat-zero-conversion -Wloop-analysis -Wc++2a-compat -Wcomma -Wimplicit-fallthrough -Wunused-function -Wunused-variable -Werror=non-literal-null-conversion -Wstring-conversion -Wtautological-overlap-compare -Wtautological-unsigned-enum-zero-compare -Wtautological-unsigned-zero-compare -Wno-error=tautological-type-limit-compare -Wno-inline-new-delete -Wno-error=deprecated-declarations -Wno-error=array-bounds -Wno-error=backend-plugin -Wno-error=return-std-move -Wno-error=atomic-alignment -Wno-error=deprecated-copy -Wformat -Wformat-security -Wno-gnu-zero-variadic-macro-arguments -Werror=implicit-function-declaration -Wno-noexcept-type -Wno-psabi -Wno-unknown-warning-option -D_GLIBCXX_USE_CXX11_ABI=0 -fno-sized-deallocation -fno-aligned-new -fsanitize=bool,bounds,enum,integer-divide-by-zero,object-size,pointer-overflow,return,vla-bound -fno-sanitize-recover=bool,bounds,enum,integer-divide-by-zero,object-size,pointer-overflow,return,vla-bound -fsanitize-blacklist=/builds/worker/workspace/obj-build/js/src/ubsan_blacklist.txt -fsanitize=address -fcrash-diagnostics-dir=/builds/worker/artifacts -fno-rtti -ffunction-sections -fdata-sections -fno-exceptions -fno-math-errno -pthread -pipe -g -Xclang -load -Xclang /builds/worker/workspace/obj-build/build/clang-plugin/libclang-plugin.so -Xclang -add-plugin -Xclang moz-check -O2 -gline-tables-only -fno-omit-frame-pointer -funwind-tables -Werror -fno-strict-aliasing -Werror=format -Wno-shadow  -MD -MP -MF .deps/Unified_cpp_js_src_jit8.o.pp   Unified_cpp_js_src_jit8.cpp
[task 2020-09-11T05:59:11.896Z] 05:59:11     INFO -  In file included from Unified_cpp_js_src_jit8.cpp:2:
[task 2020-09-11T05:59:11.896Z] 05:59:11    ERROR -  /builds/worker/checkouts/gecko/js/src/jit/Lowering.cpp:2362:16: error: unused variable 'opd' [-Werror,-Wunused-variable]
[task 2020-09-11T05:59:11.896Z] 05:59:11     INFO -    MDefinition* opd = truncate->input();
[task 2020-09-11T05:59:11.896Z] 05:59:11     INFO -                 ^
[task 2020-09-11T05:59:11.896Z] 05:59:11     INFO -  1 error generated.
[task 2020-09-11T05:59:11.896Z] 05:59:11     INFO -  /builds/worker/checkouts/gecko/config/rules.mk:723: recipe for target 'Unified_cpp_js_src_jit8.o' failed
[task 2020-09-11T05:59:11.896Z] 05:59:11    ERROR -  make[4]: *** [Unified_cpp_js_src_jit8.o] Error 1
[task 2020-09-11T05:59:11.896Z] 05:59:11     INFO -  make[4]: Leaving directory '/builds/worker/workspace/obj-build/js/src/jit'
[task 2020-09-11T05:59:11.896Z] 05:59:11     INFO -  make[4]: *** Waiting for unfinished jobs....

Pulsebot

Comment 84

•

4 years ago

Pushed by abutkovits@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/aad37f6cbf03 Part 6.2: Establish dependency from tls for x86 callWithABI div/mod i64. r=lth

Pulsebot

Comment 85

•

4 years ago

Pushed by abutkovits@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/e127d8c36c87 Backed out changeset aad37f6cbf03 for landing an incomplete stack of patches. https://hg.mozilla.org/integration/autoland/rev/1b94af3458ce Part 1: Reserve two slots after stack arguments for the future tls preservation. r=lth https://hg.mozilla.org/integration/autoland/rev/ab5825b43bb5 Part 2: Preserve callee and caller tls'es. r=lth https://hg.mozilla.org/integration/autoland/rev/b79c89e6ac82 Part 3: Implement the algorithm for obtaining tls and use it for wasm signal handling. r=lth https://hg.mozilla.org/integration/autoland/rev/f2e486f1be17 Part 4: Untie frame iteration from Frame::tls. r=lth https://hg.mozilla.org/integration/autoland/rev/62480de389ff Part 5: Remove remaining uses of Frame::tls. r=lth. CLOSED TREE

Pulsebot

Comment 86

•

4 years ago

Pushed by lhansen@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/444f4554ea94 Part 6.2: Establish dependency from tls for x86 callWithABI div/mod i64. r=lth

Pulsebot

Comment 87

•

4 years ago

Backout by abutkovits@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/1fc282e54b7a Backed out 5 changesets for landing the wrong stack of patches. CLOSED TREE

Atila Butkovits

Comment 88

•

4 years ago

The backout from comment 87 is because the stack from comment 85 wasn't supposed to be landed .

Dmitry Bezhetskov

Assignee

Comment 89

•

4 years ago

@abutkovits , let's land this one https://phabricator.services.mozilla.com/D89243
The naming might be confusing but 6.5 is an internal numbering.
I want to land this 6.5 separately because actually it is independent patch.

Flags: needinfo?(dbezhetskov) → needinfo?(abutkovits)

Pulsebot

Comment 90

•

4 years ago

Pushed by btara@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/174a622bf334 Part 6.5: Add tls dependency for WasmModD. r=lth

Atila Butkovits

Comment 91

•

4 years ago

The revision was landed: https://hg.mozilla.org/integration/autoland/rev/174a622bf334e32f911e97ca76c36b0a80d51f94

Flags: needinfo?(abutkovits)

Pulsebot

Comment 92

•

4 years ago

Pushed by btara@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/659b24c3626c Part 6.4: Add tls dependency for WasmTruncateToInt64 and Int64ToFloatingPoint for arm. r=lth

Bogdan Tara[:bogdan_tara | bogdant]

Comment 93

•

4 years ago

bugherder

https://hg.mozilla.org/mozilla-central/rev/444f4554ea94
https://hg.mozilla.org/mozilla-central/rev/174a622bf334

Andreea Pavel [:apavel]

Comment 94

•

4 years ago

bugherder

https://hg.mozilla.org/mozilla-central/rev/659b24c3626c

Pulsebot

Comment 95

•

4 years ago

Pushed by csabou@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/83a3ed07220d Part 6.3: Establish dependency from tls for arm callWithABI div/mod i64. r=lth

Pulsebot

Comment 96

•

4 years ago

Pushed by ccoroiu@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/09390cf1d667 Part 6.6: Add tls dependency for truncate i32. r=lth

Phabricator Automation

Updated

•

4 years ago

Attachment #9153756 - Attachment is obsolete: true

Dorel Luca [:dluca]

Comment 97

•

4 years ago

bugherder

https://hg.mozilla.org/mozilla-central/rev/83a3ed07220d
https://hg.mozilla.org/mozilla-central/rev/09390cf1d667

Phabricator Automation

Updated

•

4 years ago

Attachment #9162755 - Attachment description: Bug 1639153 - Part 7: Remove Frame::tls. r=luke → Bug 1639153 - Part 7: Remove Frame::tls. r=lth

Gary Kwong [:gkw] [:nth10sd] (NOT official MoCo now)

Updated

•

4 years ago

Regressions: 1664953

Ryan VanderMeulen [:RyanVM]

Updated

•

4 years ago

Regressions: 1664979

Jon Coppeard (:jonco) (PTO until 14th September)

Updated

•

4 years ago

Regressions: 1664045

Pulsebot

Comment 98

•

4 years ago

Backout by apavel@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/41fa4179cc8c Backed out 8 changesets (bug 1639153, bug 1664953) due to general instability on a CLOSED TREE

Dmitry Bezhetskov

Assignee

Comment 99

•

4 years ago

Attached file Bug 1639153 - Prepare runtime for the new lazy trampoline stub. r=lth (obsolete) (deleted) — Details

Create the proxy stub lazily for calls which go through tables.
This stub will preserve caller tls, load callee's pinned register state and etc.
In the followup patches we will fill them with the that logic, but right now let's concentrate on the runtime part.
We bake callee and tls into the trampoline stub to be able to do the switch in the near future.

Julien Cristau [:jcristau]

Comment 100

•

4 years ago

Backed out from beta as well, as requested by lth:
https://hg.mozilla.org/releases/mozilla-beta/rev/a9d230c1649f

Sebastian Hengst [:aryx] (needinfo me if it's about an intermittent or backout)

Comment 101

•

4 years ago

https://hg.mozilla.org/mozilla-central/rev/41fa4179cc8c

Phabricator Automation

Updated

•

4 years ago

Attachment #9162746 - Attachment description: Bug 1639153 - Part 6: Force Ion to preserve its tls. r=lth → Bug 1639153 - Force Ion to preserve its tls. r=lth

Phabricator Automation

Updated

•

4 years ago

Attachment #9172413 - Attachment description: Bug 1639153 - Part 6.1: Untie callWithAbi from Frame::tls for Baseline. r=lth → Bug 1639153 - Untie callWithAbi from Frame::tls for Baseline. r=lth

Phabricator Automation

Updated

•

4 years ago

Attachment #9162746 - Attachment description: Bug 1639153 - Force Ion to preserve its tls. r=lth → Bug 1639153 - Part 6.0: Force Ion to preserve its tls. r=lth

Phabricator Automation

Updated

•

4 years ago

Attachment #9172413 - Attachment description: Bug 1639153 - Untie callWithAbi from Frame::tls for Baseline. r=lth → Bug 1639153 - Part 6.1: Untie callWithAbi from Frame::tls for Baseline. r=lth

Pulsebot

Comment 102

•

4 years ago

Pushed by btara@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/0e268b8d01fe Part 6.0: Force Ion to preserve its tls. r=lth https://hg.mozilla.org/integration/autoland/rev/f0f44cfd6a07 Part 6.1: Untie callWithAbi from Frame::tls for Baseline. r=lth https://hg.mozilla.org/integration/autoland/rev/c5ef1152da2b Part 6.2: Establish dependency from tls for x86 callWithABI div/mod i64. r=lth https://hg.mozilla.org/integration/autoland/rev/bba9e8259714 Part 6.3: Establish dependency from tls for arm callWithABI div/mod i64. r=lth https://hg.mozilla.org/integration/autoland/rev/e2105a52bcdd Part 6.4: Add tls dependency for WasmTruncateToInt64 and Int64ToFloatingPoint for arm. r=lth https://hg.mozilla.org/integration/autoland/rev/81e645d5ccfd Part 6.5: Add tls dependency for WasmModD. r=lth https://hg.mozilla.org/integration/autoland/rev/bd8381646b47 Part 6.6: Add tls dependency for truncate i32. r=lth

Atila Butkovits

Comment 103

•

4 years ago

bugherder

https://hg.mozilla.org/mozilla-central/rev/0e268b8d01fe
https://hg.mozilla.org/mozilla-central/rev/f0f44cfd6a07
https://hg.mozilla.org/mozilla-central/rev/c5ef1152da2b
https://hg.mozilla.org/mozilla-central/rev/bba9e8259714
https://hg.mozilla.org/mozilla-central/rev/e2105a52bcdd
https://hg.mozilla.org/mozilla-central/rev/81e645d5ccfd
https://hg.mozilla.org/mozilla-central/rev/bd8381646b47

Daniel Veditz [:dveditz]

Updated

•

4 years ago

Regressions: 1666051

Dmitry Bezhetskov

Assignee

Comment 104

•

4 years ago

Hi, could you please land the stack of patches together:
[https://phabricator.services.mozilla.com/D91081, https://phabricator.services.mozilla.com/D82881]
There are patches Bug 1639153 Part 1 - Part 5, then Part 7 and finally this one https://phabricator.services.mozilla.com/D91081.

Flags: needinfo?(abutkovits)

Cristian Brindusan [:cbrindusan]

Comment 105

•

4 years ago

Hi Dmitry, the stack of patches cannot be landed due to the following requirement "Revision is missing a Testing Policy Project Tag." .
Please check https://phabricator.services.mozilla.com/D83064#2965334 and add the Check-in Needed flag after the requirement is fulfilled.

Flags: needinfo?(dbezhetskov)

Dmitry Bezhetskov

Assignee

Comment 106

•

4 years ago

The testing policy tag was added.

Flags: needinfo?(dbezhetskov)

Narcis Beleuzu [:NarcisB]

Comment 107

•

4 years ago

Patch failed to land with :
"Details: We're sorry, Lando could not rebase your commits for you automatically. Please manually rebase your commits and try again.

hg error in cmd: hg import --no-commit -s 95 /tmp/tmpdrpaobho: applying /tmp/tmpdrpaobho

patching file js/src/wasm/WasmInstance.cpp
Hunk #1 FAILED at 1745
1 out of 1 hunks FAILED -- saving rejects to file js/src/wasm/WasmInstance.cpp.rej
abort: patch failed to apply"

Flags: needinfo?(abutkovits) → needinfo?(dbezhetskov)

Dmitry Bezhetskov

Assignee

Comment 108

•

4 years ago

Attached file Bug 1639153 - Part 8: Adapt Cranelift-based Wasm to use ABI-2020. r=lth (deleted) — Details

This patch makes use of the new "Baldrdash2020" ABI support in Cranelift
to support the "ABI 2020" refactor in the Wasm compiler.

Sebastian Hengst [:aryx] (needinfo me if it's about an intermittent or backout)

Updated

•

4 years ago

Regressions: 1670983

Zhao Jiazhong

Updated

•

4 years ago

Regressions: 1671871

Alexandru Michis [:malexandru]

Comment 109

•

4 years ago

Dmitry, could you please add the missing testing-tags for the patches in this lando stack so that they can be landed?
https://lando.services.mozilla.com/D93190/

Dmitry Bezhetskov

Assignee

Comment 110

•

4 years ago

Hi :malexandru, I've added the missing test tags.

Flags: needinfo?(dbezhetskov)

Pulsebot

Comment 111

•

4 years ago

Pushed by ncsoregi@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/69b8eca9b1ef Part 1: Reserve two slots after stack arguments for the future tls preservation. r=lth https://hg.mozilla.org/integration/autoland/rev/e8164759f391 Part 2: Preserve callee and caller tls'es. r=lth https://hg.mozilla.org/integration/autoland/rev/a072d5ee2511 Part 3: Implement the algorithm for obtaining tls and use it for wasm signal handling. r=lth https://hg.mozilla.org/integration/autoland/rev/db39bf2058e1 Part 4: Untie frame iteration from Frame::tls. r=lth https://hg.mozilla.org/integration/autoland/rev/270961083337 Part 5: Remove remaining uses of Frame::tls. r=lth https://hg.mozilla.org/integration/autoland/rev/cd503f8f9a72 Part 7: Remove Frame::tls. r=lth https://hg.mozilla.org/integration/autoland/rev/a24b2ff5a2f1 Part 8: Adapt Cranelift-based Wasm to use ABI-2020. r=lth

Natalia Csoregi [:nataliaCs]

Comment 112

•

4 years ago

Backed out for spidermonkey bustage

backout: https://hg.mozilla.org/integration/autoland/rev/5b5ffbe4add97c98003026e0d9a730302161b646

push: https://treeherder.mozilla.org/#/jobs?repo=autoland&group_state=expanded&revision=a24b2ff5a2f1a5605c6767d14417bd4f7a032586&selectedTaskRun=VsS2eOFOTeKaDHxyJHwmLA.0

failure log: https://treeherder.mozilla.org/logviewer.html#/jobs?job_id=319118736&repo=autoland&lineNumber=88070

[task 2020-10-20T11:26:36.939Z] /builds/worker/workspace/sm-package/mozjs-84.0a1.0/obj-spider/dist/host/bin/dump_syms /builds/worker/workspace/sm-package/mozjs-84.0a1.0/obj-spider/js/src/build/libmozjs-84a1.so
[task 2020-10-20T11:26:36.939Z] Unexpected error: 'ascii' codec can't decode byte 0x90 in position 3510: ordinal not in range(128)
[task 2020-10-20T11:26:36.941Z] Traceback (most recent call last):
[task 2020-10-20T11:26:36.941Z] File "/builds/worker/workspace/sm-package/mozjs-84.0a1.0/toolkit/crashreporter/tools/symbolstore.py", line 1086, in <module>
[task 2020-10-20T11:26:36.941Z] main()
[task 2020-10-20T11:26:36.941Z] File "/builds/worker/workspace/sm-package/mozjs-84.0a1.0/toolkit/crashreporter/tools/symbolstore.py", line 1081, in main
[task 2020-10-20T11:26:36.941Z] dumper.Process(args[2], options.count_ctors)
[task 2020-10-20T11:26:36.941Z] File "/builds/worker/workspace/sm-package/mozjs-84.0a1.0/toolkit/crashreporter/tools/symbolstore.py", line 523, in Process
[task 2020-10-20T11:26:36.941Z] self.ProcessFile(file_to_process, count_ctors=count_ctors)
[task 2020-10-20T11:26:36.941Z] File "/builds/worker/workspace/sm-package/mozjs-84.0a1.0/toolkit/crashreporter/tools/symbolstore.py", line 538, in ProcessFile
[task 2020-10-20T11:26:36.941Z] file, arch_num, arch, vcs_root, dsymbundle, count_ctors=count_ctors
[task 2020-10-20T11:26:36.941Z] File "/builds/worker/workspace/sm-package/mozjs-84.0a1.0/toolkit/crashreporter/tools/symbolstore.py", line 590, in ProcessFileWork
[task 2020-10-20T11:26:36.941Z] for line in proc.stdout:
[task 2020-10-20T11:26:36.941Z] File "/usr/lib/python3.6/encodings/ascii.py", line 26, in decode
[task 2020-10-20T11:26:36.941Z] return codecs.ascii_decode(input, self.errors)[0]
[task 2020-10-20T11:26:36.941Z] UnicodeDecodeError: 'ascii' codec can't decode byte 0x90 in position 3510: ordinal not in range(128)
[task 2020-10-20T11:26:36.960Z] Traceback (most recent call last):
[task 2020-10-20T11:26:36.960Z] File "/usr/lib/python3.6/runpy.py", line 193, in _run_module_as_main
[task 2020-10-20T11:26:36.960Z] "main", mod_spec)
[task 2020-10-20T11:26:36.960Z] File "/usr/lib/python3.6/runpy.py", line 85, in _run_code
[task 2020-10-20T11:26:36.960Z] exec(code, run_globals)
[task 2020-10-20T11:26:36.960Z] File "/builds/worker/workspace/sm-package/mozjs-84.0a1.0/python/mozbuild/mozbuild/action/dumpsymbols.py", line 106, in <module>
[task 2020-10-20T11:26:36.960Z] sys.exit(main(sys.argv[1:]))
[task 2020-10-20T11:26:36.960Z] File "/builds/worker/workspace/sm-package/mozjs-84.0a1.0/python/mozbuild/mozbuild/action/dumpsymbols.py", line 102, in main
[task 2020-10-20T11:26:36.960Z] args.count_ctors)
[task 2020-10-20T11:26:36.960Z] File "/builds/worker/workspace/sm-package/mozjs-84.0a1.0/python/mozbuild/mozbuild/action/dumpsymbols.py", line 83, in dump_symbols
[task 2020-10-20T11:26:36.960Z] out_files = subprocess.check_output(args, universal_newlines=True)
[task 2020-10-20T11:26:36.960Z] File "/usr/lib/python3.6/subprocess.py", line 336, in check_output
[task 2020-10-20T11:26:36.960Z] **kwargs).stdout
[task 2020-10-20T11:26:36.960Z] File "/usr/lib/python3.6/subprocess.py", line 418, in run
[task 2020-10-20T11:26:36.960Z] output=stdout, stderr=stderr)
[task 2020-10-20T11:26:36.960Z] subprocess.CalledProcessError: Command '['/builds/worker/workspace/sm-package/mozjs-84.0a1.0/obj-spider/_virtualenvs/init_py3/bin/python', '/builds/worker/workspace/sm-package/mozjs-84.0a1.0/toolkit/crashreporter/tools/symbolstore.py', '-c', '--vcs-info', '--install-manifest=/builds/worker/workspace/sm-package/mozjs-84.0a1.0/obj-spider/_build_manifests/install/dist_include,/builds/worker/workspace/sm-package/mozjs-84.0a1.0/obj-spider/dist/include', '-s', '/builds/worker/workspace/sm-package/mozjs-84.0a1.0', '/builds/worker/workspace/sm-package/mozjs-84.0a1.0/obj-spider/dist/host/bin/dump_syms', '/builds/worker/workspace/sm-package/mozjs-84.0a1.0/obj-spider/dist/crashreporter-symbols', '/builds/worker/workspace/sm-package/mozjs-84.0a1.0/obj-spider/js/src/build/libmozjs-84a1.so']' returned non-zero exit status 1.
[task 2020-10-20T11:26:36.960Z] Running: /builds/worker/workspace/sm-package/mozjs-84.0a1.0/obj-spider/_virtualenvs/init_py3/bin/python /builds/worker/workspace/sm-package/mozjs-84.0a1.0/toolkit/crashreporter/tools/symbolstore.py -c --vcs-info --install-manifest=/builds/worker/workspace/sm-package/mozjs-84.0a1.0/obj-spider/_build_manifests/install/dist_include,/builds/worker/workspace/sm-package/mozjs-84.0a1.0/obj-spider/dist/include -s /builds/worker/workspace/sm-package/mozjs-84.0a1.0 /builds/worker/workspace/sm-package/mozjs-84.0a1.0/obj-spider/dist/host/bin/dump_syms /builds/worker/workspace/sm-package/mozjs-84.0a1.0/obj-spider/dist/crashreporter-symbols /builds/worker/workspace/sm-package/mozjs-84.0a1.0/obj-spider/js/src/build/libmozjs-84a1.so

Flags: needinfo?(dbezhetskov)

Steve Fink [:sfink] [:s:]

Comment 113

•

4 years ago

This is kind of an odd failure.

It appears that running obj-spider/dist/host/bin/dump_syms .../obj-spider/js/src/build/libmozjs-84a1.so is outputting a non-ASCII character (0x90), and python is choking on it. It's Python 3.6, and it's using univeral_newlines=True, so I would have expected this to be decoding to utf8, not ASCII, but I'm perpetually confused by this encoding stuff. Perhaps the system encoding is wrong for some reason.

I don't know if this means there's an actual symbol with a weird name, or if there was an error message with a non-ASCII character in it, or what.

I'm going to see if I can diagnose it with an interactive shell job.

Steve Fink [:sfink] [:s:]

Comment 114

•

4 years ago

This seems to be some kind of problem with the crashreporter?

Specifically: I ran the above command on an interactive loaner and saved stdout and stderr to files. The stdout file looked mostly ok, but had one line with a mangled filename:

FILE 1761 /rustc/4fb7144ed159f94491249e86d5bbd033b5d60550/src/libcore/convert/mod.rs
FILE 1762 /rustc/4fb7144ed159f94491249e86d5bbd033b5d60550/src/libcore/convert/num.rs
FILE 1763 /rustc/4fb7144ed159f94491249e86d5bbd033b5d60550/src/libcore/ffi.rs
FILE 1764 /rustc/4fb7144ed159f94491249e86d5bbd033b5d60550/src/libcore/fmt/builders.rs
FILE 1765 /rustc/4fb7144ed159f94491249e86d5bbd033b5d60550/src/libcore/fmt/k^B@^A^E^N^C^U<90>^B^G
FILE 1766 /rustc/4fb7144ed159f94491249e86d5bbd033b5d60550/src/libcore/fmt/mod.rs
FILE 1767 /rustc/4fb7144ed159f94491249e86d5bbd033b5d60550/src/libcore/fmt/num.rs
FILE 1768 /rustc/4fb7144ed159f94491249e86d5bbd033b5d60550/src/libcore/hash/mod.rs
FILE 1769 /rustc/4fb7144ed159f94491249e86d5bbd033b5d60550/src/libcore/hash/sip.rs

That's cut & paste from the interactive terminal web page. The actual bytes can be inferred, but the only one that is higher than 0x7f is the 0x90.

That directory contains 4 files. The above entry should say float.rs. The filesystem is fine; the name is only mangled in the dump_syms output.

So I'm guessing the problem has nothing to do with this bug; it probably just changed the directory listing enough to produce the particular invalid character here rather than other invalid characters.

I'm not sure who to needinfo here, and have to run right now, so I'll post what I know.

Steve Fink [:sfink] [:s:]

Updated

•

4 years ago

Depends on: 1673103

Dmitry Bezhetskov

Assignee

Comment 115

•

4 years ago

I think we can try to land the stack again because 1673103 is fixed.

Flags: needinfo?(dbezhetskov)

Pulsebot

Comment 116

•

4 years ago

Pushed by abutkovits@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/22c94980495f Part 1: Reserve two slots after stack arguments for the future tls preservation. r=lth https://hg.mozilla.org/integration/autoland/rev/3e82e75c4992 Part 2: Preserve callee and caller tls'es. r=lth https://hg.mozilla.org/integration/autoland/rev/82fbedcdde75 Part 3: Implement the algorithm for obtaining tls and use it for wasm signal handling. r=lth https://hg.mozilla.org/integration/autoland/rev/25281e1775c9 Part 4: Untie frame iteration from Frame::tls. r=lth https://hg.mozilla.org/integration/autoland/rev/1e72f84b4206 Part 5: Remove remaining uses of Frame::tls. r=lth https://hg.mozilla.org/integration/autoland/rev/c414a69233f0 Part 7: Remove Frame::tls. r=lth https://hg.mozilla.org/integration/autoland/rev/42728e99fbc5 Part 8: Adapt Cranelift-based Wasm to use ABI-2020. r=lth

Dorel Luca [:dluca]

Comment 117

•

4 years ago

Backed out 7 changesets (bug 1639153) for Spidermonkey failures in wasm/simd/cvt-x64-ion-codegen.j. CLOSED TREE

Log:
https://treeherder.mozilla.org/logviewer.html#/jobs?job_id=320144587&repo=autoland&lineNumber=26350

Push with failures:
https://treeherder.mozilla.org/#/jobs?repo=autoland&group_state=expanded&revision=42728e99fbc55663b2e573c822e1a0b529c325ae

Backout:
https://hg.mozilla.org/integration/autoland/rev/1c593eaefecfd789d3b3f9cc5be3f5d65441b417

Flags: needinfo?(dbezhetskov)

Dmitry Bezhetskov

Assignee

Comment 118

•

4 years ago

The issue was fixed, let's land the stack.

Flags: needinfo?(dbezhetskov)

Pulsebot

Comment 119

•

4 years ago

Pushed by nerli@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/b24505537d70 Part 1: Reserve two slots after stack arguments for the future tls preservation. r=lth https://hg.mozilla.org/integration/autoland/rev/50772b4ff6aa Part 2: Preserve callee and caller tls'es. r=lth https://hg.mozilla.org/integration/autoland/rev/3cfa16c9f9a9 Part 3: Implement the algorithm for obtaining tls and use it for wasm signal handling. r=lth https://hg.mozilla.org/integration/autoland/rev/5dfb9fc15fba Part 4: Untie frame iteration from Frame::tls. r=lth https://hg.mozilla.org/integration/autoland/rev/5737845e8af2 Part 5: Remove remaining uses of Frame::tls. r=lth https://hg.mozilla.org/integration/autoland/rev/107ac71d3eb1 Part 7: Remove Frame::tls. r=lth https://hg.mozilla.org/integration/autoland/rev/49de28d6a75a Part 8: Adapt Cranelift-based Wasm to use ABI-2020. r=lth CLOSED TREE

Natalia Csoregi [:nataliaCs]

Comment 120

•

4 years ago

bugherder

https://hg.mozilla.org/mozilla-central/rev/b24505537d70
https://hg.mozilla.org/mozilla-central/rev/50772b4ff6aa
https://hg.mozilla.org/mozilla-central/rev/3cfa16c9f9a9
https://hg.mozilla.org/mozilla-central/rev/5dfb9fc15fba
https://hg.mozilla.org/mozilla-central/rev/5737845e8af2
https://hg.mozilla.org/mozilla-central/rev/107ac71d3eb1
https://hg.mozilla.org/mozilla-central/rev/49de28d6a75a

Sebastian Hengst [:aryx] (needinfo me if it's about an intermittent or backout)

Updated

•

4 years ago

Regressions: 1674436

Sebastian Hengst [:aryx] (needinfo me if it's about an intermittent or backout)

Updated

•

4 years ago

No longer regressions: 1674436

Pulsebot

Comment 121

•

4 years ago

Backout by abutkovits@mozilla.com: https://hg.mozilla.org/mozilla-central/rev/adaa820203d1 Backed out 7 changesets for crashing WasmTrapHandler. DONTBUILD a=backout

Sebastian Hengst [:aryx] (needinfo me if it's about an intermittent or backout)

Comment 122

•

4 years ago

The last 7 changesets got backed out for crashes reported by users:

[@ WasmTrapHandler ] from 14 installations when I checked
https://crash-stats.mozilla.org/report/index/f0eb42d8-158a-426a-9034-f57830201031

Please also check if these are related (2 installations, all startup crashes):
[@ core::ops::function::Fn::call<T> | std::sys_common::backtrace::__rust_end_short_backtrace<T> ]

crash stack 1: https://crash-stats.mozilla.org/report/index/655326bc-3a73-4588-adad-d1db40201031
crash stack 2: https://crash-stats.mozilla.org/report/index/85a6bb2b-2f8b-4de0-914d-821eb0201031

Flags: needinfo?(dbezhetskov)

Pulsebot

Comment 123

•

4 years ago

Pushed by abutkovits@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/48d5f5918e01 Part 1: Reserve two slots after stack arguments for the future tls preservation. r=lth https://hg.mozilla.org/integration/autoland/rev/6d85470024c3 Part 2: Preserve callee and caller tls'es. r=lth https://hg.mozilla.org/integration/autoland/rev/f981949ae991 Part 3: Implement the algorithm for obtaining tls and use it for wasm signal handling. r=lth https://hg.mozilla.org/integration/autoland/rev/c50443ca5619 Part 4: Untie frame iteration from Frame::tls. r=lth https://hg.mozilla.org/integration/autoland/rev/1f21e8f610a7 Part 5: Remove remaining uses of Frame::tls. r=lth https://hg.mozilla.org/integration/autoland/rev/315a592cfd3c Part 7: Remove Frame::tls. r=lth https://hg.mozilla.org/integration/autoland/rev/beed4ac8993b Part 8: Adapt Cranelift-based Wasm to use ABI-2020. r=lth

Pulsebot

Comment 124

•

4 years ago

Backout by ccoroiu@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/19ebe2dac480 Backed out 7 changesets for accidentally landing

Dmitry Bezhetskov

Assignee

Comment 125

•

4 years ago

[@ WasmTrapHandler ] from 14 installations when I checked
Hm, the main crash (https://crash-stats.mozilla.org/report/index/f0eb42d8-158a-426a-9034-f57830201031) isn't in my code, and the corresponding stack doesn't point to me, are you sure that my changes trigger the crash?
How can I reproduce it?

Please also check if these are related (2 installations, all startup crashes):
They are rust crashes on windows x64 because not implemented: No FastCall variant of Baldrdash2020 iiuc we don't support baseline+cranelift configuration on win x64.

Flags: needinfo?(dbezhetskov) → needinfo?(aryx.bugmail)

Sebastian Hengst [:aryx] (needinfo me if it's about an intermittent or backout)

Comment 126

•

4 years ago

(In reply to Dmitry Bezhetskov from comment #125)

[@ WasmTrapHandler ] from 14 installations when I checked
Hm, the main crash (https://crash-stats.mozilla.org/report/index/f0eb42d8-158a-426a-9034-f57830201031) isn't in my code, and the corresponding stack doesn't point to me, are you sure that my changes trigger the crash?
How can I reproduce it?

Example urls to reproduce the issue (according to crash reports):

https://juraj-husek-game-studio.itch.io/last-mortem
https://mega.nz/ - and various sub pages (unknown if you need to be logged in)

Flags: needinfo?(aryx.bugmail)

Dmitry Bezhetskov

Assignee

Comment 127

•

4 years ago

I think I catch this one: https://crash-stats.mozilla.org/report/index/f0eb42d8-158a-426a-9034-f57830201031, at least I can reproduce a crash with my changes. I use a different workflow from this bug (https://bugzilla.mozilla.org/show_bug.cgi?id=1657917) but hope that the underlying problem is the same.

I'm investigating ...

Dmitry Bezhetskov

Assignee

Comment 128

•

4 years ago

Finally fixed the issue.
The problem was in MacroAssembler::wasmCallIndirect when callee.which() == wasm::CalleeDesc::AsmJSTable. I made a typo when worked on part 2. I preserved WasmTlsReg twice on WasmCallerTLSOffsetBeforeCall but I needed to preserve it on WasmCaller* and on WasmCallee*.

Because of this
GetNearestEffectiveTls(frame) return some garbage value and then
Instance* instance = GetNearestEffectiveTls(frame)->instance; led to SIGSEGV.

Pulsebot

Comment 129

•

4 years ago

Pushed by malexandru@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/8e95ddbb6b20 Part 1: Reserve two slots after stack arguments for the future tls preservation. r=lth https://hg.mozilla.org/integration/autoland/rev/a5cd8e8ce4b5 Part 2: Preserve callee and caller tls'es. r=lth https://hg.mozilla.org/integration/autoland/rev/aeacd48f565e Part 3: Implement the algorithm for obtaining tls and use it for wasm signal handling. r=lth https://hg.mozilla.org/integration/autoland/rev/cee7eaa8c402 Part 4: Untie frame iteration from Frame::tls. r=lth https://hg.mozilla.org/integration/autoland/rev/40031ad88dbd Part 5: Remove remaining uses of Frame::tls. r=lth https://hg.mozilla.org/integration/autoland/rev/5718a8d121f9 Part 7: Remove Frame::tls. r=lth https://hg.mozilla.org/integration/autoland/rev/7017395d8df9 Part 8: Adapt Cranelift-based Wasm to use ABI-2020. r=lth

Andreea Pavel [:apavel]

Comment 130

•

4 years ago

bugherder

https://hg.mozilla.org/mozilla-central/rev/8e95ddbb6b20
https://hg.mozilla.org/mozilla-central/rev/a5cd8e8ce4b5
https://hg.mozilla.org/mozilla-central/rev/aeacd48f565e
https://hg.mozilla.org/mozilla-central/rev/cee7eaa8c402
https://hg.mozilla.org/mozilla-central/rev/40031ad88dbd
https://hg.mozilla.org/mozilla-central/rev/5718a8d121f9
https://hg.mozilla.org/mozilla-central/rev/7017395d8df9

Sebastian Hengst [:aryx] (needinfo me if it's about an intermittent or backout)

Updated

•

4 years ago

Regressions: 1675222

Zhao Jiazhong

Updated

•

4 years ago

Regressions: 1679010

Dmitry Bezhetskov

Assignee

Comment 131

•

4 years ago

Attached file Bug 1639153 - Implement basic logic of trampoline tls stub. r=lth (obsolete) (deleted) — Details

Fill the stub logic: preserve tls's, switch pinned register state
for callee, call callee and then restore the initial state.
Since trampoline stub produces additional interstitial frame we tag
callerFP to indicate such frames inside frame iterator.
Stack maps are also adjusted not to count non-gc slots.

Phabricator Automation

Updated

•

4 years ago

Attachment #9198411 - Attachment description: Bug 1639153 - Implement trampoline tls stub. r=lth → Bug 1639153 - Implement basic logic of trampoline tls stub. r=lth

Dmitry Bezhetskov

Assignee

Comment 132

•

4 years ago

Attached file Bug 1639153 - Add dynamic same-instance check into tls trampoline. r=lth (obsolete) (deleted) — Details

Add the dynamic check and the fastpath for call_indirect
when target instance and source instance are actually the same.

Dmitry Bezhetskov

Assignee

Comment 133

•

4 years ago

Attached file Bug 1639153 - Add instance-time same-instance check. r=lth (obsolete) (deleted) — Details

When we know that instances are equal at instantiate time we replace
pointer to the trampoline with pointer to the checked call entry.

Dmitry Bezhetskov

Assignee

Comment 134

•

4 years ago

I'm just dumping some interstitial results for our 2/3 of this long optimization.
Recently I've uploaded a few patches that have demonstrated some performance results that I want to share.
To measure the performance I used my ubuntu intel core i7 laptop with 16gb of RAM.
Also, I've used calls_bench (posted above) for tests.

Master:
external 348
internal 294
import 225
direct 140

D90992: add a simple proxy trampoline patch that actually only jumps into the real code:
external 369
internal 328
import 231
direct 142

Well, one jump introduced its cost.

D102591: implement trampoline basic logic - switch tls back and forth:
external 357
internal 391
import 231
direct 142

Trampoline logic has its cost and so internal calls get significant slowdown.

D103573: let's add a dynamic check if call_indirect actually calls a function from our instance
external 397
internal 259
import 227
direct 142

Internal calls now actually become faster, but we can do better.

D103574: at instantiating time we can check if instances are equal and replace the trampoline pointer with a pointer to the function call checked entry.
external 405
internal 209
import 226
direct 141

In the end, we get a 16% slowdown for external calls and a 29% speedup for internal calls without any significant effect for import and direct calls.
I don't know how it affects wild-industrial use cases but the numbers are looked promising.

The last part of the optimization will be changing the representation of WebAssembly.Table.Elem that we can use for speeding up table.get/set inside wasm.

Lars T Hansen [:lth]

Comment 135

•

4 years ago

14 April integration test of current patch set:

Applying the current four-patch set to this morning's mozilla-central went smoothly (one small formatting conflict).
x64-release-debug builds completed OK.
x64-release-debug jit-test runs on wasm/ and asm.js/ (without --tbpl) had 91 failures: assertions and crashes around stack traversal, reference types, and exception handling.
x86-release-debug jit-tests ditto had 80 failures, same categories
arm64-release-debug builds require --disable-cranelift for the time being, this is OK.
arm64-release-debug jit-test runs on wasm/ and asm.js/ (without --tbpl) had 130 failures: assertions and crashes as above but also some UNIMPLEMENTED simulator crashes and a crash warning about an unaligned memory access (somebody forgot to untag a pointer?)
arm-release-debug tests ditto had 114 failures, about as for arm64, including simulator crashes

It's fairly high priority to make the patches that are up for review pass at least the integration tests, so that the review sees code that is plausibly correct.

Flags: needinfo?(dbezhetskov)

Lars T Hansen [:lth]

Updated

•

4 years ago

Depends on: 1675239

Lars T Hansen [:lth]

Updated

•

4 years ago

Blocks: 1340235

Lars T Hansen [:lth]

Updated

•

4 years ago

Priority: P3 → P2

Dmitry Bezhetskov

Assignee

Comment 136

•

4 years ago

Attached file Bug 1639153 - Introduce thunk stubs for indirect calls. r=lth (obsolete) (deleted) — Details

Introduce indirect stubs for call_indirect.
Use special interstitial trampolines for:
a) call_indirect some_imported_table idx
because we don't know what are calling
b) call_indirect our_private_table wasm_func_idx
because we actually know that this is a cross-instance call_instance call

for all other cases we can use raw function pointers because
(a) we don't change the instance
and (b) our wasm ABI guarantee that after the call tls* will be in the valid state, so
the following cases use raw function calls:

call_indirect our_private_table js_func_idx
call_indirect our_private_table internal_wasm_func

Trampolines or indirect stubs needs to be bulk created only for exported functions
and for internal functions that are inserted into exported tables.

Dmitry Bezhetskov

Assignee

Comment 137

•

3 years ago

Attached file Bug 1639153 - Introduce indirect stubs to optimize call_indirect. r=lth (obsolete) (deleted) — Details

Introduce indirect stubs for call_indirect.
Use special interstitial trampolines for:
a) call_indirect some_imported_table idx
because we don't know what are calling
b) call_indirect our_private_table wasm_func_idx
because we actually know that this is a cross-instance call_instance call

for all other cases we can use raw function pointers because
(a) we don't change the instance
and (b) our wasm ABI guarantee that after the call tls* will be in the valid state, so
the following cases use raw function calls:
- call_indirect our_private_table js_func_idx
- call_indirect our_private_table internal_wasm_func

Trampolines or indirect stubs needs to be bulk created only for exported functions
and for internal functions that are inserted into exported tables.

Phabricator Automation

Updated

•

3 years ago

Attachment #9225802 - Attachment description: Bug 1639153 - Introduce thunk stubs for indirect calls. r=lth → Bug 1639153 - Introduce indirect stubs to optimize call_indirect. r=lth

Phabricator Automation

Updated

•

3 years ago

Attachment #9222051 - Attachment is obsolete: true

Phabricator Automation

Updated

•

3 years ago

Attachment #9200275 - Attachment is obsolete: true

Phabricator Automation

Updated

•

3 years ago

Attachment #9200274 - Attachment is obsolete: true

Phabricator Automation

Updated

•

3 years ago

Attachment #9198411 - Attachment is obsolete: true

Phabricator Automation

Updated

•

3 years ago

Attachment #9177105 - Attachment is obsolete: true

Lars T Hansen [:lth]

Comment 138

•

3 years ago

(Some initial notes, I will edit these as more data become available.)

Base rev: mozilla-central 7c3ea3514425. Applied the latest (Jul 8) rev of D117123. Built it with ../configure --disable-debug --enable-optimize --enable-release --enable-debug-symbols and also made a comparable build from the base rev.

The benchmark is the call_indirect_ubench.js attached above. I run the JS shell with --wasm-compiler=ion, no other switches, loading the benchmark file on the command line.

Consistently on my Xeon E5-2637 system with Fedora 33 in the JS shell, comparing with-the-patch to without-the-patch:
"external" calls (ie cross-module indirect calls) drop running time by 14% (highly surprising, these ought to be be slower)
"internal" calls (ie same-module indirect calls) drop running time by 23% (almost precisely what Dmitry has reported)
"direct" calls increase running time by 6% (very surprising, these ought to be invariant)

On a Core-i7 with macOS 11 (this is a 2018 MacBook Pro), ditto:
"external" calls slow down slightly, but really not much (very nice)
"internal" calls speed up by about 14% (so less than on the Xeon and what Dmitry reported)
"direct" calls are roughly invariant (as desired)

On an Apple M1 with macOS 11 (M1 MacBook Pro):
"external" calls slow down by a lot (running time goes from 270 to 430, ie increasing by 60%)
"internal" calls speed up by about 15% (comparable to i7)
"direct" calls are roughly invariant (as desired)

Another set of benchmarks. This is doubly-recursive fibonacci(40) using indirect calls. There are four cases:

DIR: direct calls
IIP: indirect calls, intra-module with a private table
IIE: indirect calls, intra-module but with a shared (exported) table
IIX: indirect calls, inter-module (requires two modules and a shared table)

                Baseline                         Improved
       DIR   IIP   IIE    IIX             DIR   IIP    IIE   IIX
Xeon   753   1410  1425   1425            754   1070   1300  2080
i7     647   1130  1130   1140            645    836   1100  1520
M1     650   1050  1045   1045            650    770    980  1420

On all CPUs, in the baseline case the indirect programs all have roughly the same performance, which is what we want to see, and the direct case is much faster. Also, the direct cases have the same performance with the old and the new code, as expected & desired. For the indirect cases with the improved code, we see that there's a significant cost to the stubs: the IIX case takes almost twice as long as the IIP case. That said, the IIX case is "only" about 30% - 40% slower than the IIX case of the baseline code. The IIE case shows off an optimization where we can't statically say that a call is intramodule but we can detect it dynamically (by comparing the Tls values). This optimization is very effective and will be worthwhile if we think that this case will be common.

Lars T Hansen [:lth]

Comment 139

•

3 years ago

Attached file fib.tar (deleted) — Details

Fib benchmarks.

Dmitry Bezhetskov

Assignee

Comment 140

•

3 years ago

Thanks Lars for the testing,

I've run your fib tests too on my threadripper/Ubuntu 20.04 machine with the same configurations.
I've got the following results:

        Baseline                           Improved                          Improved - dynamic check
DIR   IIP   IIE    IIX             DIR   IIP    IIE   IIX                  DIR   IIP    IIE   IIX
633   1187  1180   1256            635   849   1029  1993                  635   853   1398  1799

I've included numbers for the version where we don't do the dynamic check because imho it is a rare case and removing it improving results for IIX a little bit.

Flags: needinfo?(dbezhetskov)

Dmitry Bezhetskov

Assignee

Comment 141

•

3 years ago

I think about call_indirect usage and find this research (https://www.usenix.org/system/files/sec20-lehmann.pdf) Table 2. It looks like the number of call_indirect can be significant while almost all wasm usages are single module usages, so I think we should care less about truly external cases, wdyt?

Flags: needinfo?(lhansen)

Lars T Hansen [:lth]

Comment 142

•

3 years ago

almost all wasm usages are single module usages

I'm a little skeptical about that (is our population all wasm programs on the net or only the prominent/important ones; there are fewer of the latter, but the number of instances of the latter probably dwarfs the number of instances of the former) but I think that when a program is multi-module this will tend to be because it uses dynamic linking, and the cost is probably affordable.

I'm probably willing to believe that we can afford the extra cost of truly external calls. I think we should probably keep the dynamic check for now but once we have better heuristics for when a table is private we should reevaluate it.

Flags: needinfo?(lhansen)

Lars T Hansen [:lth]

Comment 143

•

3 years ago

I've done a little bit of memory profiling, not very systematic yet, just testing the waters.

The methodology for this experiment is to disable the wasm baseline jit (because some programs will be compiled without tiering and for the programs that are, object code caching will restore the optimized code on reload) and then load some wasm-using pages, logging the wasm machine code allocations.

Google earth w/o patch: 34668544 + 65536
Google earth w/ patch: 31260672 + 5242880 + 65536

This illustrates (a) that the patch can have a significant positive effect on the machine code size for the program itself (the 30+MB allocation goes from 34.7MB to 31.3MB), because indirect calls are represented by a more compact machine code sequence and (b) that the indirect stubs can take a significant amount of space, here about 17% of the machine code. The overall increase caused by the patch is only 5% however, due to the mentioned savings.

PSPDFKIT ("standalone" demo) w/o patch: 23265280
PSPDFKIT w/ patch: 23003136 + 1835008

Here, there are scant savings from more compact code, and though there's a relatively smaller amount of stub code for indirect calls, there is a total increase of 7% in executable memory.

ruffle.rs is a flash emulator. Go to the site, then click "demo", then select "Saturday morning watchmen" from the top right menu.

w/o patch: 1900544 + 65536 + 8126464 + 65536
w/ patch: 1835008 + 65536 + 131072 + 65536 + 8060928 + 65536 + 131072 + 262144 + 65536

Here we may see the effect of lazily creating some stubs, and/or the effect of storing the stub with the calling module. In both runs there are two large blobs, call them "first page" and "second page". During the second run, there is (131072 + 65536) extra memory allocated for the first page and (131072 + 262144 + 65536) for the second page, this memory is all for indirect stubs. All told we're seeing a 5% increase in executable memory for the program.

(Obviously there are some quantization effects in all of these numbers since only full 64KB pages can be allocated.)

I don't think these increases are terribly scary. I do think that we need to focus a bit on bringing these numbers down, if we can, by generating more compact stubs (removing the dynamic check for the intra-instance call would help) or by generating fewer (more aggressive determination of whether a table is private to an instance). I don't think that should block landing.

Lars T Hansen [:lth]

Comment 144

•

3 years ago

It's a little harder to test the autocad web app (web.autocad.com), there's probably a bug somewhere... in ion... Anyway, disable ion and enable baseline, login to the site and "stay logged in" and then bring up a blank tab, close the autocad tab, and then quit. Do this in both browsers. Now when the browser starts it should start in the blank tab. (Without this, it could look like the browser tried to load the autocad site on startup, also weird.) To test, load web.autocad.com, click on "samples", and select the roller assembly sample, this comes up quickly enough. Then Ctrl-L and type about:blank and enter, to unload the page.

Here there are many allocations, so I've captured a log and processed the log. In the end, without the patch the executable memory for wasm reaches 188.7MB at peak, and with the patch it reaches 200.0MB, a 6% increase. Since this is baseline code, we should expect the actual relative increase to be larger than that (with a measured peak being lower). If I can generate Ion numbers, I will post an update. (Edit: Ion problem tracked by bug 1728781.)

It could look like there are about 100 more allocations with the patch than without; these would be for individual pages that are needed for lazy indirect stubs, I expect. (Edit: Indeed this works out really well: 100 allocations should work out to about 7MB, there's also a separate 7.5MB allocation in the run with the patch, for 14.5MB total; meanwhile, the program's code with the patch is about 3.5MB smaller. 14.5-3.5=11, which is very close to the difference between the runs.)

Autocad was always considered a hard problem due to its structure; it's encouraging that this looks not particularly worse than other sites, so far.

Lars T Hansen [:lth]

Comment 145

•

3 years ago

Preliminary Ion numbers for web.autocad.com: without the patch the peak is 141.4MB, with the patch the peak is 150.0 MB, for a 6% increase here as well. This is OK - we want to try to shrink it, but we can live with it. Again, there are about 100 more individual lazy allocations totalling about 6.5MB, and an additional large 7.5MB eager allocation, but a 6.5MB savings on program object code.

Lars T Hansen [:lth]

Comment 146

•

3 years ago

Summarizing the memory profiling:

we are up about 6% on peak executable memory consumption across all programs
this is not great but is totally acceptable; we should however dig into these numbers a bit to look for savings
if we remove the dynamic same-instance check from the stubs the eager stub allocations will likely be more compact
once we improve the code for avoiding even eager stub allocations for known-same-instance calls, eager stub allocations may go away, though later bulk-upgrade of tables when they are discovered to not be same-instance will remove those savings in some cases
once we use luke's clever idea for lazy stubs, most executable allocations for lazy stubs will disappear because we can allocate them densely instead one on every page. this will be very helpful for eg autocad, which has many lazy allocations, but for the other applications i don't know see that it will help much.

Possible bug: I'm not sure why we need to be allocating 64KB of executable memory for each lazy stub, as this computer uses 4KB pages, but that's what it looks like. This may be worth looking into soon. We allocate in 4KB chunks and this is the best we can do, probably.

Pulsebot

Comment 147

•

3 years ago

Pushed by dbezhetskov@igalia.com: https://hg.mozilla.org/integration/autoland/rev/d6506b5ee07b Introduce indirect stubs to optimize call_indirect. r=lth

Cristina Cozmuta (:CrissCozmuta)

Comment 148

•

3 years ago

Backed out for causing SM bustages. CLOSED TREE
Backout link : https://hg.mozilla.org/integration/autoland/rev/e0721c12a97a7b3497dbd2df3570839e5ed501fd
Push with failures : https://treeherder.mozilla.org/jobs?repo=autoland&resultStatus=testfailed%2Cbusted%2Cexception%2Crunnable&revision=d6506b5ee07b5c65f889ce7c2214b0172f205898&searchStr=SM&selectedTaskRun=Qb3LeVEMSjWZJXXJXSdV3Q.0
Link to failure log : https://treeherder.mozilla.org/logviewer?job_id=351620844&repo=autoland

Flags: needinfo?(dbezhetskov)

Pulsebot

Comment 149

•

3 years ago

Pushed by dbezhetskov@igalia.com: https://hg.mozilla.org/integration/autoland/rev/a0e51f584680 Introduce indirect stubs to optimize call_indirect. r=lth

Sandor Molnar

Comment 150

•

3 years ago

bugherder

https://hg.mozilla.org/mozilla-central/rev/a0e51f584680

Sebastian Hengst [:aryx] (needinfo me if it's about an intermittent or backout)

Updated

•

3 years ago

Regressions: 1731031

Cristian Tuns

Comment 151

•

3 years ago

Backed out for causing Bug 1731031.

Backout link

Pulsebot

Comment 152

•

3 years ago

Pushed by dbezhetskov@igalia.com: https://hg.mozilla.org/integration/autoland/rev/1a73cfde72e8 Introduce indirect stubs to optimize call_indirect. r=lth

Dmitry Bezhetskov

Assignee

Updated

•

3 years ago

Flags: needinfo?(dbezhetskov)

Cristian Tuns

Comment 153

•

3 years ago

bugherder

https://hg.mozilla.org/mozilla-central/rev/1a73cfde72e8

Lars T Hansen [:lth]

Updated

•

3 years ago

Regressions: 1733040

Lars T Hansen [:lth]

Updated

•

3 years ago

Regressions: 1733016

Cristian Tuns

Comment 154

•

3 years ago

Backed out for causing the Bug 1733016.

Backout link

Cristian Tuns

Comment 155

•

3 years ago

Backout merged to central: https://hg.mozilla.org/mozilla-central/rev/187c2cd787ba

Alexandru Ionescu (needinfo me) [:alexandrui]

Comment 156

•

3 years ago

(In reply to Cristian Tuns from comment #151)

Backed out for causing Bug 1731031.

Backout link

== Change summary for alert #31797 (as of Mon, 11 Oct 2021 11:56:06 GMT) ==

Improvements:

Ratio	Test	Platform	Options	Absolute values (old vs new)
2%	wasm-godot-baseline	macosx1015-64-shippable-qr	webrender	499.30 -> 487.07

For up to date results, see: https://treeherder.mozilla.org/perfherder/alerts?id=31797

Alexandru Ionescu (needinfo me) [:alexandrui]

Updated

•

3 years ago

Keywords: perf-alert

Lars T Hansen [:lth]

Comment 157

•

3 years ago

Wohoo, I can reproduce on try (second 'p' failure on https://treeherder.mozilla.org/jobs?repo=try&revision=088f448eeb74e9b3ea34cc54705ee09ebbe1dc19&group_state=expanded&selectedTaskRun=fRNFivQtQQmwxP9SMsFYYQ.0), this is the same fault in a new test case. The symptoms are the same too: we get an indirect call to null to an element that should have been initialized by an active element segment initializer. This is the third case, the other two are https://treeherder.mozilla.org/logviewer?job_id=353031198&repo=autoland&lineNumber=47748 and https://treeherder.mozilla.org/logviewer?job_id=352999489&repo=autoland&lineNumber=28275.

(I'm not really any closer to understanding what this is but I'm currently going over the patch line by line to see if there's anything I've missed.)

Lars T Hansen [:lth]

Updated

•

3 years ago

Depends on: 1742053

Lars T Hansen [:lth]

Updated

•

3 years ago

Summary: Optimize indirect calls → Optimize indirect calls by not pessimizing same-instance calls

Lars T Hansen [:lth]

Updated

•

3 years ago

Depends on: 1742930

Lars T Hansen [:lth]

Updated

•

3 years ago

No longer blocks: 1340235

Lars T Hansen [:lth]

Updated

•

3 years ago

Blocks: 1742930

No longer depends on: 1742930

Phabricator Automation

Updated

•

3 years ago

Attachment #9225802 - Attachment is obsolete: true

Lars T Hansen [:lth]

Updated

•

3 years ago

Blocks: 1743586

Lars T Hansen [:lth]

Comment 158

•

3 years ago

Remaining blocking bug is considered fixed, so closing this as fixed. Further work to be filed as individual bugs under the umbrella of bug 1742930.

Status: ASSIGNED → RESOLVED

Closed: 4 years ago → 3 years ago

Resolution: --- → FIXED

BugBot [:suhaib / :marco/ :calixte]

Updated

•

3 years ago

Keywords: leave-open

Mathew Hodson

Updated

•

3 years ago

status-firefox78: wontfix → ---

status-firefox79: wontfix → ---

Lars T Hansen [:lth]

Updated

•

3 years ago

Regressions: 1750930

Lars T Hansen [:lth]

Updated

•

3 years ago

Regressions: 1752870

Lars T Hansen [:lth]

Updated

•

3 years ago

See Also: → https://bugzilla.mozilla.org/show_bug.cgi?id=1753061

Lars T Hansen [:lth]

Comment 159

•

3 years ago

Bug 1742053, which landed a refactoring of the last work here, was backed out by bug 1753061. We're not going to attempt to re-land, but will be adopting a different strategy, see bug 1754377.

Lars T Hansen [:lth]

Updated

•

3 years ago

Regressions: 1715459

Mathew Hodson

Updated

•

3 years ago

Target Milestone: --- → 84 Branch

Mathew Hodson

Updated

•

3 years ago

Blocks: 1754377

Lars T Hansen [:lth]

Updated

•

3 years ago

No longer blocks: 1754377

Bug 1639153 - change call_indirect signature check to happen after frame is pushed r=lth,wingo 5 years ago Dmitry Bezhetskov (deleted), text/x-phabricator-request		Details
Bug 1639153 - call_indirect pushes trampoline, jumps to checked tail entry r=lth,wingo 5 years ago Dmitry Bezhetskov (deleted), text/x-phabricator-request		Details
Bug 1639153 - Remove using Frame::tls in Jit import exit stub r=lth,wingo 5 years ago Dmitry Bezhetskov (deleted), text/x-phabricator-request		Details
Bug 1639153 - Refactor wasm::Frame to reduce number of gnarly casts. r=lth,wingo 4 years ago Dmitry Bezhetskov (deleted), text/x-phabricator-request		Details
Bug 1639153 - Interpreter entry pushes trampoline frame to preserve tls for runtime r=lth 4 years ago Dmitry Bezhetskov (deleted), text/x-phabricator-request		Details
Bug 1639153 - [MIPS] Fix a recent refactoring to wasm::Frame on mips. r=lth 4 years ago Zhao Jiazhong (deleted), text/x-phabricator-request		Details
call_indirect_ubench.js 4 years ago Lars T Hansen [:lth] (deleted), application/x-javascript		Details
call_indirect_ubench.js 4 years ago Lars T Hansen [:lth] (deleted), application/x-javascript		Details
Bug 1639153 - Part 1: Reserve two slots after stack arguments for the future tls preservation. r=lth 4 years ago Dmitry Bezhetskov (deleted), text/x-phabricator-request		Details
Bug 1639153 - Part 2: Preserve callee and caller tls'es. r=lth 4 years ago Dmitry Bezhetskov (deleted), text/x-phabricator-request		Details
Bug 1639153 - Part 3: Implement the algorithm for obtaining tls and use it for wasm signal handling. r=lth 4 years ago Dmitry Bezhetskov (deleted), text/x-phabricator-request		Details
Bug 1639153 - Part 4: Untie frame iteration from Frame::tls. r=lth 4 years ago Dmitry Bezhetskov (deleted), text/x-phabricator-request		Details
Bug 1639153 - Part 5: Remove remaining uses of Frame::tls. r=lth 4 years ago Dmitry Bezhetskov (deleted), text/x-phabricator-request		Details
Bug 1639153 - Part 6.0: Force Ion to preserve its tls. r=lth 4 years ago Dmitry Bezhetskov (deleted), text/x-phabricator-request		Details
Bug 1639153 - Part 7: Remove Frame::tls. r=lth 4 years ago Dmitry Bezhetskov (deleted), text/x-phabricator-request		Details
Bug 1639153 - Part 6.1: Untie callWithAbi from Frame::tls for Baseline. r=lth 4 years ago Dmitry Bezhetskov (deleted), text/x-phabricator-request		Details
Bug 1639153 - Part 6.2: Establish dependency from tls for x86 callWithABI div/mod i64. r=lth 4 years ago Dmitry Bezhetskov (deleted), text/x-phabricator-request		Details
Bug 1639153 - Part 6.3: Establish dependency from tls for arm callWithABI div/mod i64. r=lth 4 years ago Dmitry Bezhetskov (deleted), text/x-phabricator-request		Details
Bug 1639153 - Part 6.4: Add tls dependency for WasmTruncateToInt64 and Int64ToFloatingPoint for arm. r=lth 4 years ago Dmitry Bezhetskov (deleted), text/x-phabricator-request		Details
Bug 1639153 - Part 6.5: Add tls dependency for WasmModD. r=lth 4 years ago Dmitry Bezhetskov (deleted), text/x-phabricator-request		Details
Bug 1639153 - Part 6.6: Add tls dependency for truncate i32. r=lth 4 years ago Dmitry Bezhetskov (deleted), text/x-phabricator-request		Details
Bug 1639153 - Prepare runtime for the new lazy trampoline stub. r=lth 4 years ago Dmitry Bezhetskov (deleted), text/x-phabricator-request		Details
Bug 1639153 - Part 8: Adapt Cranelift-based Wasm to use ABI-2020. r=lth 4 years ago Dmitry Bezhetskov (deleted), text/x-phabricator-request		Details
Bug 1639153 - Implement basic logic of trampoline tls stub. r=lth 4 years ago Dmitry Bezhetskov (deleted), text/x-phabricator-request		Details
Bug 1639153 - Add dynamic same-instance check into tls trampoline. r=lth 4 years ago Dmitry Bezhetskov (deleted), text/x-phabricator-request		Details
Bug 1639153 - Add instance-time same-instance check. r=lth 4 years ago Dmitry Bezhetskov (deleted), text/x-phabricator-request		Details
Bug 1639153 - Introduce thunk stubs for indirect calls. r=lth 4 years ago Dmitry Bezhetskov (deleted), text/x-phabricator-request		Details
Bug 1639153 - Introduce indirect stubs to optimize call_indirect. r=lth 3 years ago Dmitry Bezhetskov (deleted), text/x-phabricator-request		Details
fib.tar 3 years ago Lars T Hansen [:lth] (deleted), application/x-tar		Details