Closed Bug 1457703 Opened 7 years ago Closed 6 years ago

[ARM64 Hardware Only] Crash [@ js::gc::TenuredCell::arena] and various other GC signatures

Categories

(Core :: JavaScript: GC, defect, P2)

ARM64
Linux
defect

Tracking

()

RESOLVED INVALID
Tracking Status
firefox-esr52 --- wontfix
firefox-esr60 --- wontfix
firefox59 --- wontfix
firefox60 --- wontfix
firefox61 - wontfix
firefox62 - wontfix

People

(Reporter: decoder, Assigned: jonco)

References

(Blocks 1 open bug)

Details

(4 keywords, Whiteboard: [geckoview:crow])

Crash Data

Attachments

(5 files, 3 obsolete files)

I am seeing various GC crashes that appear while fuzzing on ARM64 hardware only. None of these crashes ever reproduce when I try to triage them. This bug is about collecting some of the signatures and devising strategies to find the root cause of these issues. Since RR is not available to ARM64, this won't be an easy task. To start, here is a common crash trace from mozilla-central revision 63a0e2f626fe (build with --enable-optimize --enable-posix-nspr-emulation --enable-valgrind --enable-gczeal --disable-debug, run with --fuzzing-safe --cpu-count=2 --spectre-mitigations=on --spectre-mitigations=off --arm-asm-nop-fill=1 --nursery-strings=on --baseline-eager --arm-sim-icache-checks): Backtrace: Program terminated with signal SIGSEGV, Segmentation fault. #0 0x0000000000b5ee30 in js::gc::TenuredCell::arena (this=<optimized out>) at js/src/gc/Cell.h:334 #1 js::gc::TenuredCell::zoneFromAnyThread (this=<optimized out>) at js/src/gc/Cell.h:361 #2 IsAboutToBeFinalizedInternal<js::Shape> (thingp=0xffffa95fe038) at js/src/gc/Marking.cpp:3384 #3 0x0000000000b8bf74 in js::InitialShapeEntry::needsSweep (this=0xffffa9684548) at js/src/vm/Shape.h:1423 #4 JS::StructGCPolicy<js::InitialShapeEntry>::needsSweep (tp=<optimized out>) at js/src/optarm64/dist/include/js/GCPolicyAPI.h:87 #5 JS::GCHashSet<js::InitialShapeEntry, js::InitialShapeEntry, js::SystemAllocPolicy>::sweep (this=0xffffa9192818) at js/src/optarm64/dist/include/js/GCHashTable.h:263 #6 JS::WeakCache<JS::GCHashSet<js::InitialShapeEntry, js::InitialShapeEntry, js::SystemAllocPolicy> >::sweep (this=0xffffa91927f8) at js/src/optarm64/dist/include/js/GCHashTable.h:619 #7 0x0000000000b4883c in IncrementalSweepWeakCacheTask::run (this=0xfffffbb70378) at js/src/gc/GC.cpp:6127 #8 0x00000000008ee1d4 in js::GCParallelTask::runFromHelperThread (this=0xfffffbb70378, locked=...) at js/src/vm/HelperThreads.cpp:1543 #9 0x00000000008ee2b0 in js::HelperThread::handleGCParallelWorkload (this=0xffffa96170b0, locked=...) at js/src/vm/HelperThreads.cpp:1575 #10 0x00000000008eeedc in js::HelperThread::threadLoop (this=0xffffa96170b0) at js/src/vm/HelperThreads.cpp:2326 #11 0x00000000008f7960 in js::detail::ThreadTrampoline<void (&)(void*), js::HelperThread*>::callMain<0ul> (this=0xffffa9618070) at js/src/threading/Thread.h:242 #12 js::detail::ThreadTrampoline<void (&)(void*), js::HelperThread*>::Start (aPack=0xffffa9618070) at js/src/threading/Thread.h:235 #13 0x0000ffffaa7eefc4 in start_thread (arg=0x8f7948 <js::detail::ThreadTrampoline<void (&)(void*), js::HelperThread*>::Start(void*)>) at pthread_create.c:335 #14 0x0000ffffaa4e32e0 in thread_start () at ../sysdeps/unix/sysv/linux/aarch64/clone.S:89 x0 0xa95fe038 281473523376184 x1 0x1b0790 7696583165840 x2 0x805f0000 281472835452928 x3 0xfffe8 1048552 x4 0x1010100 72340172838076672 x5 0x0 0 x6 0xaa83e000 281473542512640 x7 0x0 0 x8 0x3479deb4 880402100 x9 0x3479deb4 880402100 x10 0x1f3533 2045235 x11 0x9d27f5e 164790110 x12 0x18 24 x13 0xa53bdde5 -1522803227 x14 0xb0000000 11884838885785600 x15 0x0 16777216000000000 x16 0xf302d8 15925976 x17 0xaa4efcf8 281473539046648 x18 0x9 9 x19 0x2 2 x20 0xe5b000 15052800 x21 0xa95fe040 281473523376192 x22 0x0 0 x23 0xa91927f8 281473518741496 x24 0x1 1 x25 0x0 0 x26 0xa9684540 281473523926336 x27 0xa9686000 281473523933184 x28 0xa9684558 281473523926360 x29 0xa95fdfa0 281473523376032 x30 0xb8bf74 12107636 sp 0xa95fdfa0 281473523376032 pc 0xb5ee30 <IsAboutToBeFinalizedInternal<js::Shape>(js::Shape**)+56> cpsr 0x20000000 536870912 fpcsr void fpcr 0x0 0 => 0xb5ee30 <IsAboutToBeFinalizedInternal<js::Shape>(js::Shape**)+56>: ldrb w2, [x2,#20] 0xb5ee34 <IsAboutToBeFinalizedInternal<js::Shape>(js::Shape**)+60>: cmp w2, #0x3
Here is another crash trace that I see in dozens of variations, but always with a similar crash address: Summary: Crash [@ js::Shape::updateBaseShapeAfterMovingGC] Build version: mozilla-central revision 378a8a64401f Build flags: --enable-optimize --enable-posix-nspr-emulation --enable-valgrind --enable-gczeal --enable-debug Runtime options: --fuzzing-safe --cpu-count=2 --test-wasm-await-tier2 --nursery-strings=on Backtrace: Program terminated with signal SIGSEGV, Segmentation fault. #0 js::Shape::updateBaseShapeAfterMovingGC (this=<optimized out>) at js/src/vm/Shape-inl.h:130 #1 JS::Zone::fixupInitialShapeTable (this=this@entry=0xffff9ebea000) at js/src/vm/Shape.cpp:2240 #2 0x0000000000e8703c in JS::Zone::fixupAfterMovingGC (this=this@entry=0xffff9ebea000) at js/src/gc/Zone.cpp:372 #3 0x0000000000e0d9b4 in js::gc::GCRuntime::updateZonePointersToRelocatedCells (this=this@entry=0xffff9f019720, zone=zone@entry=0xffff9ebea000) at js/src/gc/GC.cpp:2914 #4 0x0000000000e1d6b8 in js::gc::GCRuntime::compactPhase (this=this@entry=0xffff9f019720, reason=JS::gcreason::API, reason@entry=JS::gcreason::DEBUG_GC, sliceBudget=..., session=...) at js/src/gc/GC.cpp:6728 #5 0x0000000000e1f5a4 in js::gc::GCRuntime::incrementalCollectSlice (this=this@entry=0xffff9f019720, budget=..., reason=reason@entry=JS::gcreason::DEBUG_GC, session=...) at js/src/gc/GC.cpp:7198 #6 0x0000000000e2054c in js::gc::GCRuntime::gcCycle (this=this@entry=0xffff9f019720, nonincrementalByAPI=255, nonincrementalByAPI@entry=true, budget=..., reason=reason@entry=JS::gcreason::DEBUG_GC) at js/src/gc/GC.cpp:7479 #7 0x0000000000e20ad8 in js::gc::GCRuntime::collect (this=this@entry=0xffff9f019720, nonincrementalByAPI=nonincrementalByAPI@entry=true, budget=..., reason=reason@entry=JS::gcreason::DEBUG_GC) at js/src/gc/GC.cpp:7622 #8 0x0000000000e20dc8 in js::gc::GCRuntime::gc (this=this@entry=0xffff9f019720, gckind=gckind@entry=GC_SHRINK, reason=reason@entry=JS::gcreason::DEBUG_GC) at js/src/gc/GC.cpp:7692 #9 0x0000000000e224e0 in js::gc::GCRuntime::runDebugGC (this=this@entry=0xffff9f019720) at js/src/gc/GC.cpp:8180 #10 0x0000000000e225f8 in js::gc::GCRuntime::gcIfNeededAtAllocation (this=0xffff9f019720, cx=cx@entry=0xffff9f016000) at js/src/gc/Allocator.cpp:310 #11 0x0000000000e4b50c in js::gc::GCRuntime::checkAllocatorState<(js::AllowGC)1> (this=<optimized out>, cx=0xffff9f016000, kind=js::gc::AllocKind::FUNCTION) at js/src/gc/Allocator.cpp:271 #12 0x0000000000e4b704 in js::Allocate<JSObject, (js::AllowGC)1> (cx=cx@entry=0xffff9f016000, kind=kind@entry=js::gc::AllocKind::FUNCTION, nDynamicSlots=nDynamicSlots@entry=0, heap=heap@entry=js::gc::TenuredHeap, clasp=clasp@entry=0x12ce5c0 <JSFunction::class_>) at js/src/gc/Allocator.cpp:52 #13 0x0000000000a2f170 in js::NativeObject::create (cx=0xffff9f016000, kind=js::gc::AllocKind::FUNCTION, heap=js::gc::TenuredHeap, shape=..., group=...) at js/src/vm/NativeObject-inl.h:539 #14 0x0000000000b2b454 in NewObject (cx=<optimized out>, group=..., kind=js::gc::AllocKind::FUNCTION, newKind=js::TenuredObject, initialShapeFlags=0) at js/src/vm/JSObject.cpp:731 #15 0x0000000000b2bcd4 in js::NewObjectWithClassProtoCommon (cx=<optimized out>, cx@entry=0xffff9f016000, clasp=clasp@entry=0x12ce5c0 <JSFunction::class_>, protoArg=..., protoArg@entry=..., allocKind=allocKind@entry=js::gc::AllocKind::FUNCTION, newKind=js::TenuredObject, newKind@entry=js::SingletonObject) at js/src/vm/JSObject.cpp:852 #16 0x0000000000b42434 in js::NewObjectWithClassProto (newKind=js::SingletonObject, allocKind=js::gc::AllocKind::FUNCTION, proto=..., clasp=0x12ce5c0 <JSFunction::class_>, cx=0xffff9f016000) at js/src/vm/JSObject-inl.h:692 #17 js::NewObjectWithClassProto<JSFunction> (newKind=js::SingletonObject, allocKind=js::gc::AllocKind::FUNCTION, proto=..., cx=0xffff9f016000) at js/src/vm/JSObject-inl.h:717 #18 js::NewFunctionWithProto (cx=0xffff9f016000, native=native@entry=0x0, nargs=nargs@entry=0, flags=JSFunction::INTERPRETED_NORMAL, enclosingEnv=..., enclosingEnv@entry=..., atom=..., proto=..., allocKind=js::gc::AllocKind::FUNCTION, newKind=newKind@entry=js::TenuredObject) at js/src/vm/JSFunction.cpp:2070 #19 0x000000000053f154 in js::frontend::ParserBase::newFunction (this=this@entry=0xfffff2dacef0, atom=..., atom@entry=..., kind=kind@entry=js::frontend::FunctionSyntaxKind::Statement, generatorKind=generatorKind@entry=js::GeneratorKind::NotGenerator, asyncKind=asyncKind@entry=js::FunctionAsyncKind::SyncFunction, proto=..., proto@entry=...) at js/src/frontend/Parser.cpp:2845 [...] #100 JS::Compile (cx=cx@entry=0xffff9f016000, options=..., chars=chars@entry=0xffff9f074000 u"/* -*- indent-tabs-mode: nil; js-indent-level: 2 -*- */\n\n/* This Source Code Form is subject to the terms of the Mozilla Public\n * License, v. 2.0. If a copy of the MPL was not distributed with this\n "..., length=<optimized out>, script=..., script@entry=...) at js/src/jsapi.cpp:4157 #101 0x000000000046975c in Evaluate (cx=<optimized out>, argc=<optimized out>, vp=<optimized out>) at js/src/shell/js.cpp:1934 #102 0x0000000000599b40 in js::CallJSNative (cx=0xffff9f016000, native=native@entry=0x468f60 <Evaluate(JSContext*, unsigned int, JS::Value*)>, args=...) at js/src/vm/JSContext-inl.h:280 #103 0x000000000058eab0 in js::InternalCallOrConstruct (cx=<optimized out>, cx@entry=0xffff9f016000, args=..., construct=construct@entry=js::NO_CONSTRUCT) at js/src/vm/Interpreter.cpp:467 #104 0x000000000058ee28 in InternalCall (cx=0xffff9f016000, args=...) at js/src/vm/Interpreter.cpp:516 #105 0x00000000005832e4 in js::CallFromStack (args=..., cx=<optimized out>) at js/src/vm/Interpreter.cpp:522 #106 Interpret (cx=0xffff9f016000, state=...) at js/src/vm/Interpreter.cpp:3084 #107 0x000000000058e5e8 in js::RunScript (cx=0xffff9f016000, state=...) at js/src/vm/Interpreter.cpp:417 #108 0x000000000058eb5c in js::InternalCallOrConstruct (cx=<optimized out>, cx@entry=0xffff9f016000, args=..., construct=construct@entry=js::NO_CONSTRUCT) at js/src/vm/Interpreter.cpp:489 #109 0x000000000058ee28 in InternalCall (cx=0xffff9f016000, args=...) at js/src/vm/Interpreter.cpp:516 #110 0x000000000058ef98 in js::Call (cx=<optimized out>, fval=..., fval@entry=..., thisv=..., thisv@entry=..., args=..., rval=...) at js/src/vm/Interpreter.cpp:535 #111 0x0000000000905f18 in js::jit::InvokeFunction (cx=<optimized out>, obj=..., constructing=<optimized out>, ignoresReturnValue=<optimized out>, argc=1, argv=0xc5de007bfffffff9, rval=...) at js/src/jit/VMFunctions.cpp:105 #112 0x00000000009060b8 in js::jit::InvokeFromInterpreterStub (cx=0xffff9f016000, frame=0xfffff2daea28) at js/src/jit/VMFunctions.cpp:134 #113 0x00002126e32dab7c in ?? () Backtrace stopped: previous frame inner to this frame (corrupt stack?) x0 0x4b4b4b4b 5425512962855750475 x1 0xa2448 664648 x2 0x7f259d00 -4188684655930270464 x3 0xf2da7990 281474756147600 x4 0xfc0a0 1032352 x5 0xa 10 x6 0x9ecfcce0 281473346161888 x7 0x1c5140 7696583250240 x8 0x18 24 x9 0x1a 26 x10 0x1931c0 7696583045568 x11 0x0 0 x12 0x18 24 x13 0xa53bdda4 -1522803292 x14 0x9a000000 7884192444579840 x15 0x0 16777216000000000 x16 0x12f04d8 19858648 x17 0xa01b3408 281473367880712 x18 0x1 1 x19 0xf2da7ac0 281474756147904 x20 0xf2da7a90 281474756147856 x21 0xf2da7af8 281474756147960 x22 0xf2da7aa0 281474756147872 x23 0xbad0bad1 3134241489 x24 0xf2da7ad0 281474756147920 x25 0x1a2448 7696583107656 x26 0x12ef000 19853312 x27 0x9ecfcd80 281473346162048 x28 0x1a2448 7696583107656 x29 0xf2da79f0 281474756147696 x30 0xbbeba8 12315560 sp 0xf2da79f0 281474756147696 pc 0xbbe5e4 <JS::Zone::fixupInitialShapeTable()+260> cpsr 0x20000000 536870912 fpcsr void fpcr 0x0 0 => 0xbbe5e4 <JS::Zone::fixupInitialShapeTable()+260>: ldr w1, [x0,#4] 0xbbe5e8 <JS::Zone::fixupInitialShapeTable()+264>: cmp w1, w23
Right now I am performing another fuzzing run with settings set to preserve all core dumps during the run, so we can get a developer to look at a core dump.
Whiteboard: [geckoview:crow]
(In reply to Christian Holler (:decoder) from comment #2) > Right now I am performing another fuzzing run with settings set to preserve > all core dumps during the run, so we can get a developer to look at a core > dump. I've looked at some core dumps and there are a few categories: (1) We segfault on some LDR instruction, but according to GDB everything is fine: the memory location can be accessed, is aligned correctly, contains valid data, etc. Maybe a CPU cache synchronization issue. (2) We crash because of a moved-tenured/swept-tenured poison pattern or the js::gc::Relocated magic value. One way forward is to disable the background tasks one by one (background finalization, compacting, off-thread parsing) and see if/when the crashes stop, and hopefully that will narrow it down more. I can post some patches for this.
Attached patch Run GCParallelTask on main thread (obsolete) (deleted) — — Splinter Review
This should run the GC helper tasks (except background sweeping) on the main thread. If this stops the crashes, we can be more precise. If this doesn't stop the crashes, we can disable more helper tasks.
Attachment #8971939 - Flags: feedback?(choller)
Does this bug need to be hidden?
Priority: -- → P2
(In reply to Jason Orendorff [:jorendorff] from comment #5) > Does this bug need to be hidden? Most of the crashes I am seeing are s-s.
Attached patch Patch v2 (deleted) — — Splinter Review
The crashes stopped when running all GCParallelTasks on the main thread. Here's a similar patch, but we now allow UpdatePointersTask to run off-thread again, so we can see if the crashes return.
Attachment #8971939 - Attachment is obsolete: true
Attachment #8971939 - Flags: feedback?(choller)
Attachment #8972662 - Flags: feedback?(choller)
I filed bug 1458839 to add some more assertions to the state management in GCParallelTask.
Comment on attachment 8972662 [details] [diff] [review] Patch v2 With this patch, the crashes return, so the problem is with the UpdatePointersTask.
Attachment #8972662 - Flags: feedback?(choller) → feedback+
Attached file helgrind.txt (deleted) —
I gave :sewardj access to the machine and he managed to get Valgrind/Helgrind trunk to work on the shell there and he was able to obtain this log. It might contain false positives and we are currently rebuilding the shell with --disable-jemalloc to improve, but the log already shows a race at ~UpdatePointersTask. Maybe this is coincidence but I think it is worth investigating. This is on m-c revision 2d83e1843241 with jandem's second patch from this bug.
Flags: needinfo?(jcoppeard)
(In reply to Christian Holler (:decoder) from comment #11) The ~UpdatePointersTask race is a false positive as far as I can see. It's the race on vptr described here: https://github.com/google/sanitizers/wiki/ThreadSanitizerPopularDataRaces#data-race-on-vptr So the vptr is being updated on entry to the descructor. This is only a problem when calling an inherited destructor for a derived class. In this case UpdatePointersTask is the most derived class and this write will not change the value of the vptr. We should still fix this if possible. I tried making the derived classes final but this didn't fix the problem in local testing. There are a couple of other warnings in the log file that I will try to address too. I can't see that any of them could cause this crash though.
Flags: needinfo?(jcoppeard)
(In reply to Christian Holler (:decoder) from comment #1) > Here is another crash trace that I see in dozens of variations, but always > with a similar crash address: > Program terminated with signal SIGSEGV, Segmentation fault. > #0 js::Shape::updateBaseShapeAfterMovingGC (this=<optimized out>) at > js/src/vm/Shape-inl.h:130 > #1 JS::Zone::fixupInitialShapeTable (this=this@entry=0xffff9ebea000) at > js/src/vm/Shape.cpp:2240 (In reply to Jon Coppeard (:jonco) from comment #12) > There are a couple of other warnings in the log file that I will try to > address too. I can't see that any of them could cause this crash though. Just after the middle of the Helgrind log file, there's a report of a race between updateEdge<js::Shape>, called from js::gc::MovingTracer::onShapeEdge (write, happens first) and shape, called from updateShapeAfterMovingGC (read, happens second) Given that the read stack seems somewhat similar to the crash stack (involving updateBaseShapeAfterMovingGC), could that be a relevant?
Flags: needinfo?(jcoppeard)
(In reply to Julian Seward [:jseward] from comment #13) That's one of the other ones I'm looking at. I don't *think* this is a problem - the code should work whether the read observes either the old or the new value - but I'm working on a patch for it anyway.
Flags: needinfo?(jcoppeard)
Attached patch compacting-thread-count (deleted) — — Splinter Review
There's an off-by-one error in the way GCRuntime::updateCellPointers counts how many tasks it started. This leads to statistics for the final task not being recorded, but we still join on the task because that happens in UpdatePointersTask's destructor.
Attachment #8973252 - Flags: review?(sphink)
Attached patch compacting-update-races (obsolete) (deleted) — — Splinter Review
I can see several races related to updating shapes (e.g. the one pointed out in comment 13). What happens at the moment is that when we call MaybeForwarded for a JSObject* it will not only return the forwarded object pointer if the object is forwarded, but also update the object's shape if that has been forwarded. This was done so that you can actually use the object you get back without crashes when you do anything that touches its shape. This is a race however: some trace hooks for an object make use of a different associated object (e.g. array buffer views access the array buffer object). So with this setup the shape can get updated in two places, once when the object itself is updated and once when updating another object that also references it. It turns out we don't really need to update the shape though. The only place it's used is for checking the object's slot count in various debug assertions. We can call MaybeForwarded when reading the shape in these assertions instead.
Attachment #8973257 - Flags: review?(sphink)
Attached patch compacting-run-method (obsolete) (deleted) — — Splinter Review
Patch to remove virtual GCParallelTask::run() and replace with function pointer. This works around the benign race descrived in comment 12. Overall this turned out nicer than I expected, but it still kinda sucks to have to do this.
Attachment #8973258 - Flags: review?(sphink)
I'm not confident that any of these patches will fix the underlying issue, but let's try it and see.
Comment on attachment 8973257 [details] [diff] [review] compacting-update-races Cancelling review due to crashes on try.
Attachment #8973257 - Flags: review?(sphink)
Comment on attachment 8973252 [details] [diff] [review] compacting-thread-count Review of attachment 8973252 [details] [diff] [review]: ----------------------------------------------------------------- Heh. Subtle.
Attachment #8973252 - Flags: review?(sphink) → review+
Comment on attachment 8973258 [details] [diff] [review] compacting-run-method Review of attachment 8973258 [details] [diff] [review]: ----------------------------------------------------------------- Ouch, unfortunate. I like your use of CRTP to keep it as clean as it can be, though.
Attachment #8973258 - Flags: review?(sphink) → review+
Comment on attachment 8973257 [details] [diff] [review] compacting-update-races Review of attachment 8973257 [details] [diff] [review]: ----------------------------------------------------------------- ::: js/src/vm/NativeObject-inl.h @@ +603,5 @@ > +NativeObject::slotSpanMaybeForwarded() const > +{ > + Shape* shape = MaybeForwarded(lastProperty()); > + if (shape->inDictionary()) > + return shape->base()->slotSpan(); This hasn't changed AFAICT, but just so I understand -- why don't you need to do MaybeForwarded(shape->base())->slotSpan() here?
(In reply to Steve Fink [:sfink] [:s:] (PTO June 31) from comment #22) > This hasn't changed AFAICT, but just so I understand -- why don't you need > to do MaybeForwarded(shape->base())->slotSpan() here? Shapes have already been updated by this point - see the comment here: https://searchfox.org/mozilla-central/source/js/src/gc/GC.cpp#2835
Attached patch compacting-update-races v2 (deleted) — — Splinter Review
This changes makes ArrayBufferObject use get/setFixedSlot everywhere and changes the assertions in those methods to a version that works if the shape has been forwarded. I had to add another method to set the private given the number of fixed slots. This is green on try now.
Attachment #8973257 - Attachment is obsolete: true
Attachment #8973940 - Flags: review?(sphink)
Decoder, can you try testing with the attached patches to see if they make any difference? Hopefully helgrind will report fewer races at least.
Flags: needinfo?(choller)
Attached patch compacting-run-method v2 (deleted) — — Splinter Review
Rebased patch.
Assignee: nobody → jcoppeard
Attachment #8973258 - Attachment is obsolete: true
Attachment #8973970 - Flags: review+
Jon, could these GC crashes affect x86 or 32-bit ARM platforms? This bug was filed as "ARM64 hardware only", but your patches don't appear to have any ARM64-specific code.
Component: JavaScript Engine → JavaScript: GC
Flags: needinfo?(jcoppeard)
(In reply to Chris Peterson [:cpeterson] from comment #27) It's unlikely that they affect x86 and it's possible they affect 32bit ARM platforms. I don't know for sure what the root of the problem is but the weaker memory model on ARM is probably a factor.
Flags: needinfo?(jcoppeard)
Attachment #8973940 - Flags: review?(sphink) → review+
(In reply to Jon Coppeard (:jonco) from comment #23) > (In reply to Steve Fink [:sfink] [:s:] (PTO June 31) from comment #22) > > This hasn't changed AFAICT, but just so I understand -- why don't you need > > to do MaybeForwarded(shape->base())->slotSpan() here? > > Shapes have already been updated by this point - see the comment here: > https://searchfox.org/mozilla-central/source/js/src/gc/GC.cpp#2835 Oh! Right, sorry, I was getting mixed up between pointers and pointees, and thinking "hey, you needed a forwarding check for the shape, so why not for the base shape that's updated in the same pass?"
Comment on attachment 8973252 [details] [diff] [review] compacting-thread-count [Security approval request comment] Requesting approval to land these patches as even though they might not fix this problem, they are an improvement in general and reduce the number of possible races by the compacting code. How easily could an exploit be constructed based on the patch? It's not clear that's possible so I'd say extremely difficult. Do comments in the patch, the check-in comment, or tests included in the patch paint a bulls-eye on the security problem? No. Which older supported branches are affected by this flaw? Everything back to FF 48. If not all supported branches, which bug introduced the flaw? Bug 1257186. Do you have backports for the affected branches? If not, how different, hard to create, and risky will they be? Should be trivial if necessary. How likely is this patch to cause regressions; how much testing does it need? Unlikely.
Attachment #8973252 - Flags: sec-approval?
Comment on attachment 8973940 [details] [diff] [review] compacting-update-races v2 [Security approval request comment] Again, this patch is not necessarily a fix but removes a possible source of races. How easily could an exploit be constructed based on the patch? Very difficult. Do comments in the patch, the check-in comment, or tests included in the patch paint a bulls-eye on the security problem? No. Which older supported branches are affected by this flaw? Everything back to FF 48. If not all supported branches, which bug introduced the flaw? Bug 1257903. Do you have backports for the affected branches? If not, how different, hard to create, and risky will they be? Should be trivial if necessary. How likely is this patch to cause regressions; how much testing does it need? Unlikely.
Attachment #8973940 - Flags: sec-approval?
Comment on attachment 8973970 [details] [diff] [review] compacting-run-method v2 [Security approval request comment] Again, this removes another possible source of races. How easily could an exploit be constructed based on the patch? Very difficult. Do comments in the patch, the check-in comment, or tests included in the patch paint a bulls-eye on the security problem? No. Which older supported branches are affected by this flaw? Everything back to FF 35. If not all supported branches, which bug introduced the flaw? Bug 1064578. Do you have backports for the affected branches? If not, how different, hard to create, and risky will they be? Should be trivial. How likely is this patch to cause regressions; how much testing does it need? Unlikely.
Attachment #8973970 - Flags: sec-approval?
sec-approval+ for landing these patches.
Attachment #8973252 - Flags: sec-approval? → sec-approval+
Attachment #8973940 - Flags: sec-approval? → sec-approval+
Attachment #8973970 - Flags: sec-approval? → sec-approval+
This is an automated crash issue comment: Summary: Crash [@ js::ShapeTable::searchUnchecked<(js::MaybeAdding)0>] Build version: mozilla-central revision 21f09d7e7214 Build flags: --enable-optimize --enable-posix-nspr-emulation --enable-valgrind --enable-gczeal --disable-debug Runtime options: --fuzzing-safe --cpu-count=2 --arm-asm-nop-fill=1 --arm-sim-icache-checks --test-wasm-await-tier2 Backtrace: Program terminated with signal SIGSEGV, Segmentation fault. #0 js::ShapeTable::searchUnchecked<(js::MaybeAdding)0> (id=..., this=<optimized out>) at js/src/vm/Shape-inl.h:292 #1 js::ShapeTable::search<(js::MaybeAdding)0> (id=..., this=<optimized out>) at js/src/vm/Shape-inl.h:371 #2 js::Shape::search<(js::MaybeAdding)0> (id=..., start=<optimized out>, cx=0xffffa4415000) at js/src/vm/Shape-inl.h:96 #3 js::Shape::search (id=..., cx=0xffffa4415000, this=<optimized out>) at js/src/vm/Shape-inl.h:48 #4 js::jit::SetNativeDataProperty<true> (cx=0xffffa4415000, obj=0x7000017e060, name=0x7000019c780, val=0xffffdf85c888) at js/src/jit/VMFunctions.cpp:1717 #5 0x000023c851a508d8 in ?? () x0 0x8 8 x1 0xbf88be52 3213409874 x2 0x0 0 x3 0x4b4b4b4b 5425512962855750475 x4 0xa44045b8 281473437418936 x5 0x0 0 x6 0x1 1 x7 0x0 0 x8 0xa4419c10 281473437506576 x9 0xa41f4280 281473435255424 x10 0x323ef0 3292912 x11 0x18 24 x12 0x18 24 x13 0x2598f2e 39423790 x14 0xb0000000 11884838885785600 x15 0x0 16777216000000000 x16 0x7d4bf0 8211440 x17 0x1fffc 131068 x18 0xa447b070 281473437905008 x19 0x1edd80 7696583417216 x20 0xf46000 16015360 x21 0x17e060 7696582959200 x22 0xdf85c888 281474431830152 x23 0x19c780 7696583083904 x24 0xa4415000 281473437487104 x25 0xef7000 15691776 x26 0xa3fea0b8 281473433116856 x27 0x0 0 x28 0xdf85c800 281474431830016 x29 0xdf85c7a0 281474431829920 x30 0x51a508d8 39343270201560 sp 0xdf85c7a0 281474431829920 pc 0x7d4c68 <js::jit::SetNativeDataProperty<true>(JSContext*, JSObject*, js::PropertyName*, JS::Value*)+120> cpsr 0x80000000 -2147483648 fpcsr void fpcr 0x0 0 => 0x7d4c68 <js::jit::SetNativeDataProperty<true>(JSContext*, JSObject*, js::PropertyName*, JS::Value*)+120>: ldr w2, [x3] 0x7d4c6c <js::jit::SetNativeDataProperty<true>(JSContext*, JSObject*, js::PropertyName*, JS::Value*)+124>: ldr x5, [x3,#16] Looks like there are still crashes, even with the patches here landed. I will try to find some test that we can run Helgrind on, maybe it shows something else.
Flags: needinfo?(choller)
(In reply to Christian Holler (:decoder) from comment #36) > Looks like there are still crashes, even with the patches here landed. Yes, the compacting-update-races patch has caused a couple more fuzz bugs. Overall it's still a simplification of our approach and worth doing though.
So I ran Helgrind on another random GC crashing test and caught this race report: ==18222== Possible data race during write of size 8 at 0x5DFF8E0 by thread #4 ==18222== Locks held: none ==18222== at 0xDE4908: bitwiseOrRangeInto (Bitmap.h:53) ==18222== by 0xDE4908: AddBitmapToChunkMarkBits<js::DenseBitmap> (AtomMarking.cpp:125) ==18222== by 0xDE4908: js::gc::AtomMarkingRuntime::updateChunkMarkBits(JSRuntime*) (AtomMarking.cpp:148) ==18222== by 0xDF3BDF: UpdateAtomsBitmap(js::GCParallelTask*) (GC.cpp:5400) ==18222== by 0xAA8A1B: runTask (GCParallelTask.h:127) ==18222== by 0xAA8A1B: js::GCParallelTask::runFromHelperThread(js::AutoLockHelperThreadState&) (HelperThreads.cpp:1586) ==18222== by 0xAAEFD3: js::HelperThread::handleGCParallelWorkload(js::AutoLockHelperThreadState&) (HelperThreads.cpp:1612) ==18222== by 0xAA74EF: js::HelperThread::threadLoop() (HelperThreads.cpp:2385) ==18222== by 0xAA761B: js::HelperThread::ThreadMain(void*) (HelperThreads.cpp:1869) ==18222== by 0xABCC07: callMain<0ul> (Thread.h:242) ==18222== by 0xABCC07: js::detail::ThreadTrampoline<void (&)(void*), js::HelperThread*>::Start(void*) (Thread.h:235) ==18222== by 0x484BA37: mythread_wrapper (hg_intercepts.c:389) ==18222== by 0x486AFC3: start_thread (pthread_create.c:335) ==18222== by 0x4BDA2DF: thread_start (clone.S:89) ==18222== ==18222== This conflicts with a previous read of size 8 by thread #3 ==18222== Locks held: none ==18222== at 0xBC7B04: markBit (Heap.h:611) ==18222== by 0xBC7B04: isMarkedAny (Heap.h:615) ==18222== by 0xBC7B04: js::gc::TenuredCell::isMarkedAny() const (Cell.h:286) ==18222== by 0xE3D96B: js::gc::IsAboutToBeFinalizedDuringSweep(js::gc::TenuredCell&) (Marking.cpp:3389) ==18222== by 0xE5F5AF: operator()<JSString> (Marking.cpp:3423) ==18222== by 0xE5F5AF: decltype ({parm#1}(static_cast<JSString*>((decltype(nullptr))0), (Forward<bool*>)({parm#3}))) js::DispatchTyped<IsAboutToBeFinalizedInternalFunctor<jsid>, bool*>(IsAboutToBeFinalizedInternalFunctor<jsid>, jsid const&, bool*&&) (Id.h:228) ==18222== by 0xE5F607: IsAboutToBeFinalizedInternal<jsid> (Marking.cpp:3433) ==18222== by 0xE5F607: bool js::gc::IsAboutToBeFinalizedUnbarriered<jsid>(jsid*) (Marking.cpp:3458) ==18222== by 0xB64D0B: needsSweep (ObjectGroup.cpp:1083) ==18222== by 0xB64D0B: needsSweep (GCPolicyAPI.h:87) ==18222== by 0xB64D0B: needsSweep (ObjectGroup.cpp:1772) ==18222== by 0xB64D0B: sweep (GCHashTable.h:83) ==18222== by 0xB64D0B: js::ObjectGroupCompartment::sweep() (ObjectGroup.cpp:1790) ==18222== by 0xDF5D17: SweepObjectGroups(js::GCParallelTask*) (GC.cpp:5422) ==18222== by 0xAA8A1B: runTask (GCParallelTask.h:127) ==18222== by 0xAA8A1B: js::GCParallelTask::runFromHelperThread(js::AutoLockHelperThreadState&) (HelperThreads.cpp:1586) ==18222== by 0xAAEFD3: js::HelperThread::handleGCParallelWorkload(js::AutoLockHelperThreadState&) (HelperThreads.cpp:1612) ==18222== Address 0x5dff8e0 is in a rw- anonymous segment This is on m-c rev 45ec8fd380dd with a debug build. There was another race related to wasm that I will ask :bbouvier about because it didn't look related to this bug. Let me know if this looks related/helpful.
Flags: needinfo?(jcoppeard)
(In reply to Christian Holler (:decoder) from comment #38) > ==18222== This conflicts with a previous read of size 8 by thread #3 > ==18222== Locks held: none > ==18222== at 0xBC7B04: markBit (Heap.h:611) I see that markBit() is marked MOZ_TSAN_BLACKLIST, so presumably it's a known race. Is it harmless, though?
(In reply to Julian Seward [:jseward] from comment #39) > I see that markBit() is marked MOZ_TSAN_BLACKLIST, so presumably it's a > known race. Is it harmless, though? The race that prompted this annotation was between gray unmarking a cell on the main thread (writing) and sweeping a different cell on a background thread (reading), where the mark bits for different cells end up in the same word. This is a different issue. I'm going to spin this off into a separate bug.
Flags: needinfo?(jcoppeard)
I don't do this often... but https://imgflip.com/i/2abxi0
(In reply to Christian Holler (:decoder) from comment #41) Bug 1424934 could remedy this situation.
I notice ARM64 specific JIT crashes in Bug 1461480 that fault at 1MB aligned addresses. Probably unrelated.
Jon, can you update on the status of this bug as it applies to non-trunk branches? Are these general GC issues that the ARM64 hardware just seems to trigger, an ARM64-only issue, or something else? Just trying to understand what if anything we need to do with respect to backports here.
Flags: needinfo?(jcoppeard)
(In reply to Ryan VanderMeulen [:RyanVM] from comment #44) These are not ARM64-specific issues so we should backport.
Flags: needinfo?(jcoppeard)
(In reply to Christian Holler (:decoder) from comment #38) I filed bug 1461619 for race reported by Helgrind in comment 38.
Comment on attachment 8973252 [details] [diff] [review] compacting-thread-count Approval Request Comment [Feature/Bug causing the regression]: Bug 1257186. [User impact if declined]: Possible crash / security vulnerability. [Is this code covered by automated tests?]: Yes. [Has the fix been verified in Nightly?]: Yes. [Needs manual test from QE? If yes, steps to reproduce]: No. [List of other uplifts needed for the feature/fix]: None. [Is the change risky?]: No. [Why is the change risky/not risky?]: This is a very simple change and is covered by assertions in the patch. [String changes made/needed]: None.
Attachment #8973252 - Flags: approval-mozilla-beta?
Comment on attachment 8973970 [details] [diff] [review] compacting-run-method v2 Approval Request Comment [Feature/Bug causing the regression]: Bug 1064578. [User impact if declined]: Possible crash / security vulnerability. [Is this code covered by automated tests?]: Yes. [Has the fix been verified in Nightly?]: Yes. [Needs manual test from QE? If yes, steps to reproduce]: No. [List of other uplifts needed for the feature/fix]: None. [Is the change risky?]: Low risk. [Why is the change risky/not risky?]: This is a fairly mechanical change that only affects the GC. [String changes made/needed]: None.
Attachment #8973970 - Flags: approval-mozilla-beta?
(In reply to Jon Coppeard (:jonco) from comment #45) I'm not sure whether we should backport the compacting-update-races patch. This is a complicated change that has already required a couple of fixes. Given that we didn't have any evidence that this was causing problems, I'm inclined to let it ride. The other patches are much simpler and are worth backporting just in case.
Comment on attachment 8973970 [details] [diff] [review] compacting-run-method v2 Cancelling approval request and moving to bug 1465108.
Attachment #8973970 - Flags: approval-mozilla-beta?
Comment on attachment 8973252 [details] [diff] [review] compacting-thread-count Cancelling approval request and moving to bug 1465108.
Attachment #8973252 - Flags: approval-mozilla-beta?
Attachment #8973252 - Flags: checkin+
Attachment #8973940 - Flags: checkin+
Attachment #8973970 - Flags: checkin+
This bug is getting to be difficult to track given how patches are landing. I'm going to propose that we keep this bug open as a meta bug at this point and file new bugs blocking this one for any subsequent patches that need to land. Also dropping tracking+ status for the various releases for that reason (we can tracking+ the new bugs as they come in). Christian, how are things looking these days?
Flags: needinfo?(choller)
I'm still seeing the random GC crashes unfortunately (see comment 36). I'm trying to get TSan to work and use Valgrind/Helgrind on more of the non-reproducing tests, but I hit difficulties with both tools due to the architecture: TSan is currently just hanging on startup (I've contacted the developers about this) and Helgrind is either not detecting new races or also hanging in the middle of some tests. I'll ask Julien for help as well.
Flags: needinfo?(choller)
I mentioned this to some people before, but I feel like it should be documented here. In conjunction with the crashes, I see a lot of these in dmesg: [4838736.964530] Unhandled fault: TLB conflict abort (0x92000030) at 0x0000ffff98042000 Not sure if this is the result of us crashing or the cause.
Depends on: 1461619
(In reply to Christian Holler (:decoder) from comment #54) > [4838736.964530] Unhandled fault: TLB conflict abort (0x92000030) at > 0x0000ffff98042000 It sounds like this is due to incorrect TLB maintenance, so more of an OS-level problem.
Marking this fix-optional for 62 to remove it from triage, since there are crashes visible in crash-stats for 62 with this signature in the last month, and this is kind of a placeholder/meta bug now. If anything comes up and we have a patch for 62/63, it should show up in a dependent bug.
I'm closing this as invalid based on comment 55.
Status: NEW → RESOLVED
Closed: 6 years ago
Resolution: --- → INVALID
Group: javascript-core-security
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: