Closed
Bug 1457703
Opened 7 years ago
Closed 6 years ago
[ARM64 Hardware Only] Crash [@ js::gc::TenuredCell::arena] and various other GC signatures
Categories
(Core :: JavaScript: GC, defect, P2)
Tracking
()
People
(Reporter: decoder, Assigned: jonco)
References
(Blocks 1 open bug)
Details
(4 keywords, Whiteboard: [geckoview:crow])
Crash Data
Attachments
(5 files, 3 obsolete files)
(deleted),
patch
|
decoder
:
feedback+
|
Details | Diff | Splinter Review |
(deleted),
text/plain
|
Details | |
(deleted),
patch
|
sfink
:
review+
abillings
:
sec-approval+
RyanVM
:
checkin+
|
Details | Diff | Splinter Review |
(deleted),
patch
|
sfink
:
review+
abillings
:
sec-approval+
RyanVM
:
checkin+
|
Details | Diff | Splinter Review |
(deleted),
patch
|
jonco
:
review+
abillings
:
sec-approval+
RyanVM
:
checkin+
|
Details | Diff | Splinter Review |
I am seeing various GC crashes that appear while fuzzing on ARM64 hardware only. None of these crashes ever reproduce when I try to triage them.
This bug is about collecting some of the signatures and devising strategies to find the root cause of these issues. Since RR is not available to ARM64, this won't be an easy task.
To start, here is a common crash trace from mozilla-central revision 63a0e2f626fe (build with --enable-optimize --enable-posix-nspr-emulation --enable-valgrind --enable-gczeal --disable-debug, run with --fuzzing-safe --cpu-count=2 --spectre-mitigations=on --spectre-mitigations=off --arm-asm-nop-fill=1 --nursery-strings=on --baseline-eager --arm-sim-icache-checks):
Backtrace:
Program terminated with signal SIGSEGV, Segmentation fault.
#0 0x0000000000b5ee30 in js::gc::TenuredCell::arena (this=<optimized out>) at js/src/gc/Cell.h:334
#1 js::gc::TenuredCell::zoneFromAnyThread (this=<optimized out>) at js/src/gc/Cell.h:361
#2 IsAboutToBeFinalizedInternal<js::Shape> (thingp=0xffffa95fe038) at js/src/gc/Marking.cpp:3384
#3 0x0000000000b8bf74 in js::InitialShapeEntry::needsSweep (this=0xffffa9684548) at js/src/vm/Shape.h:1423
#4 JS::StructGCPolicy<js::InitialShapeEntry>::needsSweep (tp=<optimized out>) at js/src/optarm64/dist/include/js/GCPolicyAPI.h:87
#5 JS::GCHashSet<js::InitialShapeEntry, js::InitialShapeEntry, js::SystemAllocPolicy>::sweep (this=0xffffa9192818) at js/src/optarm64/dist/include/js/GCHashTable.h:263
#6 JS::WeakCache<JS::GCHashSet<js::InitialShapeEntry, js::InitialShapeEntry, js::SystemAllocPolicy> >::sweep (this=0xffffa91927f8) at js/src/optarm64/dist/include/js/GCHashTable.h:619
#7 0x0000000000b4883c in IncrementalSweepWeakCacheTask::run (this=0xfffffbb70378) at js/src/gc/GC.cpp:6127
#8 0x00000000008ee1d4 in js::GCParallelTask::runFromHelperThread (this=0xfffffbb70378, locked=...) at js/src/vm/HelperThreads.cpp:1543
#9 0x00000000008ee2b0 in js::HelperThread::handleGCParallelWorkload (this=0xffffa96170b0, locked=...) at js/src/vm/HelperThreads.cpp:1575
#10 0x00000000008eeedc in js::HelperThread::threadLoop (this=0xffffa96170b0) at js/src/vm/HelperThreads.cpp:2326
#11 0x00000000008f7960 in js::detail::ThreadTrampoline<void (&)(void*), js::HelperThread*>::callMain<0ul> (this=0xffffa9618070) at js/src/threading/Thread.h:242
#12 js::detail::ThreadTrampoline<void (&)(void*), js::HelperThread*>::Start (aPack=0xffffa9618070) at js/src/threading/Thread.h:235
#13 0x0000ffffaa7eefc4 in start_thread (arg=0x8f7948 <js::detail::ThreadTrampoline<void (&)(void*), js::HelperThread*>::Start(void*)>) at pthread_create.c:335
#14 0x0000ffffaa4e32e0 in thread_start () at ../sysdeps/unix/sysv/linux/aarch64/clone.S:89
x0 0xa95fe038 281473523376184
x1 0x1b0790 7696583165840
x2 0x805f0000 281472835452928
x3 0xfffe8 1048552
x4 0x1010100 72340172838076672
x5 0x0 0
x6 0xaa83e000 281473542512640
x7 0x0 0
x8 0x3479deb4 880402100
x9 0x3479deb4 880402100
x10 0x1f3533 2045235
x11 0x9d27f5e 164790110
x12 0x18 24
x13 0xa53bdde5 -1522803227
x14 0xb0000000 11884838885785600
x15 0x0 16777216000000000
x16 0xf302d8 15925976
x17 0xaa4efcf8 281473539046648
x18 0x9 9
x19 0x2 2
x20 0xe5b000 15052800
x21 0xa95fe040 281473523376192
x22 0x0 0
x23 0xa91927f8 281473518741496
x24 0x1 1
x25 0x0 0
x26 0xa9684540 281473523926336
x27 0xa9686000 281473523933184
x28 0xa9684558 281473523926360
x29 0xa95fdfa0 281473523376032
x30 0xb8bf74 12107636
sp 0xa95fdfa0 281473523376032
pc 0xb5ee30 <IsAboutToBeFinalizedInternal<js::Shape>(js::Shape**)+56>
cpsr 0x20000000 536870912
fpcsr void
fpcr 0x0 0
=> 0xb5ee30 <IsAboutToBeFinalizedInternal<js::Shape>(js::Shape**)+56>: ldrb w2, [x2,#20]
0xb5ee34 <IsAboutToBeFinalizedInternal<js::Shape>(js::Shape**)+60>: cmp w2, #0x3
Reporter | ||
Comment 1•7 years ago
|
||
Here is another crash trace that I see in dozens of variations, but always with a similar crash address:
Summary: Crash [@ js::Shape::updateBaseShapeAfterMovingGC]
Build version: mozilla-central revision 378a8a64401f
Build flags: --enable-optimize --enable-posix-nspr-emulation --enable-valgrind --enable-gczeal --enable-debug
Runtime options: --fuzzing-safe --cpu-count=2 --test-wasm-await-tier2 --nursery-strings=on
Backtrace:
Program terminated with signal SIGSEGV, Segmentation fault.
#0 js::Shape::updateBaseShapeAfterMovingGC (this=<optimized out>) at js/src/vm/Shape-inl.h:130
#1 JS::Zone::fixupInitialShapeTable (this=this@entry=0xffff9ebea000) at js/src/vm/Shape.cpp:2240
#2 0x0000000000e8703c in JS::Zone::fixupAfterMovingGC (this=this@entry=0xffff9ebea000) at js/src/gc/Zone.cpp:372
#3 0x0000000000e0d9b4 in js::gc::GCRuntime::updateZonePointersToRelocatedCells (this=this@entry=0xffff9f019720, zone=zone@entry=0xffff9ebea000) at js/src/gc/GC.cpp:2914
#4 0x0000000000e1d6b8 in js::gc::GCRuntime::compactPhase (this=this@entry=0xffff9f019720, reason=JS::gcreason::API, reason@entry=JS::gcreason::DEBUG_GC, sliceBudget=..., session=...) at js/src/gc/GC.cpp:6728
#5 0x0000000000e1f5a4 in js::gc::GCRuntime::incrementalCollectSlice (this=this@entry=0xffff9f019720, budget=..., reason=reason@entry=JS::gcreason::DEBUG_GC, session=...) at js/src/gc/GC.cpp:7198
#6 0x0000000000e2054c in js::gc::GCRuntime::gcCycle (this=this@entry=0xffff9f019720, nonincrementalByAPI=255, nonincrementalByAPI@entry=true, budget=..., reason=reason@entry=JS::gcreason::DEBUG_GC) at js/src/gc/GC.cpp:7479
#7 0x0000000000e20ad8 in js::gc::GCRuntime::collect (this=this@entry=0xffff9f019720, nonincrementalByAPI=nonincrementalByAPI@entry=true, budget=..., reason=reason@entry=JS::gcreason::DEBUG_GC) at js/src/gc/GC.cpp:7622
#8 0x0000000000e20dc8 in js::gc::GCRuntime::gc (this=this@entry=0xffff9f019720, gckind=gckind@entry=GC_SHRINK, reason=reason@entry=JS::gcreason::DEBUG_GC) at js/src/gc/GC.cpp:7692
#9 0x0000000000e224e0 in js::gc::GCRuntime::runDebugGC (this=this@entry=0xffff9f019720) at js/src/gc/GC.cpp:8180
#10 0x0000000000e225f8 in js::gc::GCRuntime::gcIfNeededAtAllocation (this=0xffff9f019720, cx=cx@entry=0xffff9f016000) at js/src/gc/Allocator.cpp:310
#11 0x0000000000e4b50c in js::gc::GCRuntime::checkAllocatorState<(js::AllowGC)1> (this=<optimized out>, cx=0xffff9f016000, kind=js::gc::AllocKind::FUNCTION) at js/src/gc/Allocator.cpp:271
#12 0x0000000000e4b704 in js::Allocate<JSObject, (js::AllowGC)1> (cx=cx@entry=0xffff9f016000, kind=kind@entry=js::gc::AllocKind::FUNCTION, nDynamicSlots=nDynamicSlots@entry=0, heap=heap@entry=js::gc::TenuredHeap, clasp=clasp@entry=0x12ce5c0 <JSFunction::class_>) at js/src/gc/Allocator.cpp:52
#13 0x0000000000a2f170 in js::NativeObject::create (cx=0xffff9f016000, kind=js::gc::AllocKind::FUNCTION, heap=js::gc::TenuredHeap, shape=..., group=...) at js/src/vm/NativeObject-inl.h:539
#14 0x0000000000b2b454 in NewObject (cx=<optimized out>, group=..., kind=js::gc::AllocKind::FUNCTION, newKind=js::TenuredObject, initialShapeFlags=0) at js/src/vm/JSObject.cpp:731
#15 0x0000000000b2bcd4 in js::NewObjectWithClassProtoCommon (cx=<optimized out>, cx@entry=0xffff9f016000, clasp=clasp@entry=0x12ce5c0 <JSFunction::class_>, protoArg=..., protoArg@entry=..., allocKind=allocKind@entry=js::gc::AllocKind::FUNCTION, newKind=js::TenuredObject, newKind@entry=js::SingletonObject) at js/src/vm/JSObject.cpp:852
#16 0x0000000000b42434 in js::NewObjectWithClassProto (newKind=js::SingletonObject, allocKind=js::gc::AllocKind::FUNCTION, proto=..., clasp=0x12ce5c0 <JSFunction::class_>, cx=0xffff9f016000) at js/src/vm/JSObject-inl.h:692
#17 js::NewObjectWithClassProto<JSFunction> (newKind=js::SingletonObject, allocKind=js::gc::AllocKind::FUNCTION, proto=..., cx=0xffff9f016000) at js/src/vm/JSObject-inl.h:717
#18 js::NewFunctionWithProto (cx=0xffff9f016000, native=native@entry=0x0, nargs=nargs@entry=0, flags=JSFunction::INTERPRETED_NORMAL, enclosingEnv=..., enclosingEnv@entry=..., atom=..., proto=..., allocKind=js::gc::AllocKind::FUNCTION, newKind=newKind@entry=js::TenuredObject) at js/src/vm/JSFunction.cpp:2070
#19 0x000000000053f154 in js::frontend::ParserBase::newFunction (this=this@entry=0xfffff2dacef0, atom=..., atom@entry=..., kind=kind@entry=js::frontend::FunctionSyntaxKind::Statement, generatorKind=generatorKind@entry=js::GeneratorKind::NotGenerator, asyncKind=asyncKind@entry=js::FunctionAsyncKind::SyncFunction, proto=..., proto@entry=...) at js/src/frontend/Parser.cpp:2845
[...]
#100 JS::Compile (cx=cx@entry=0xffff9f016000, options=..., chars=chars@entry=0xffff9f074000 u"/* -*- indent-tabs-mode: nil; js-indent-level: 2 -*- */\n\n/* This Source Code Form is subject to the terms of the Mozilla Public\n * License, v. 2.0. If a copy of the MPL was not distributed with this\n "..., length=<optimized out>, script=..., script@entry=...) at js/src/jsapi.cpp:4157
#101 0x000000000046975c in Evaluate (cx=<optimized out>, argc=<optimized out>, vp=<optimized out>) at js/src/shell/js.cpp:1934
#102 0x0000000000599b40 in js::CallJSNative (cx=0xffff9f016000, native=native@entry=0x468f60 <Evaluate(JSContext*, unsigned int, JS::Value*)>, args=...) at js/src/vm/JSContext-inl.h:280
#103 0x000000000058eab0 in js::InternalCallOrConstruct (cx=<optimized out>, cx@entry=0xffff9f016000, args=..., construct=construct@entry=js::NO_CONSTRUCT) at js/src/vm/Interpreter.cpp:467
#104 0x000000000058ee28 in InternalCall (cx=0xffff9f016000, args=...) at js/src/vm/Interpreter.cpp:516
#105 0x00000000005832e4 in js::CallFromStack (args=..., cx=<optimized out>) at js/src/vm/Interpreter.cpp:522
#106 Interpret (cx=0xffff9f016000, state=...) at js/src/vm/Interpreter.cpp:3084
#107 0x000000000058e5e8 in js::RunScript (cx=0xffff9f016000, state=...) at js/src/vm/Interpreter.cpp:417
#108 0x000000000058eb5c in js::InternalCallOrConstruct (cx=<optimized out>, cx@entry=0xffff9f016000, args=..., construct=construct@entry=js::NO_CONSTRUCT) at js/src/vm/Interpreter.cpp:489
#109 0x000000000058ee28 in InternalCall (cx=0xffff9f016000, args=...) at js/src/vm/Interpreter.cpp:516
#110 0x000000000058ef98 in js::Call (cx=<optimized out>, fval=..., fval@entry=..., thisv=..., thisv@entry=..., args=..., rval=...) at js/src/vm/Interpreter.cpp:535
#111 0x0000000000905f18 in js::jit::InvokeFunction (cx=<optimized out>, obj=..., constructing=<optimized out>, ignoresReturnValue=<optimized out>, argc=1, argv=0xc5de007bfffffff9, rval=...) at js/src/jit/VMFunctions.cpp:105
#112 0x00000000009060b8 in js::jit::InvokeFromInterpreterStub (cx=0xffff9f016000, frame=0xfffff2daea28) at js/src/jit/VMFunctions.cpp:134
#113 0x00002126e32dab7c in ?? ()
Backtrace stopped: previous frame inner to this frame (corrupt stack?)
x0 0x4b4b4b4b 5425512962855750475
x1 0xa2448 664648
x2 0x7f259d00 -4188684655930270464
x3 0xf2da7990 281474756147600
x4 0xfc0a0 1032352
x5 0xa 10
x6 0x9ecfcce0 281473346161888
x7 0x1c5140 7696583250240
x8 0x18 24
x9 0x1a 26
x10 0x1931c0 7696583045568
x11 0x0 0
x12 0x18 24
x13 0xa53bdda4 -1522803292
x14 0x9a000000 7884192444579840
x15 0x0 16777216000000000
x16 0x12f04d8 19858648
x17 0xa01b3408 281473367880712
x18 0x1 1
x19 0xf2da7ac0 281474756147904
x20 0xf2da7a90 281474756147856
x21 0xf2da7af8 281474756147960
x22 0xf2da7aa0 281474756147872
x23 0xbad0bad1 3134241489
x24 0xf2da7ad0 281474756147920
x25 0x1a2448 7696583107656
x26 0x12ef000 19853312
x27 0x9ecfcd80 281473346162048
x28 0x1a2448 7696583107656
x29 0xf2da79f0 281474756147696
x30 0xbbeba8 12315560
sp 0xf2da79f0 281474756147696
pc 0xbbe5e4 <JS::Zone::fixupInitialShapeTable()+260>
cpsr 0x20000000 536870912
fpcsr void
fpcr 0x0 0
=> 0xbbe5e4 <JS::Zone::fixupInitialShapeTable()+260>: ldr w1, [x0,#4]
0xbbe5e8 <JS::Zone::fixupInitialShapeTable()+264>: cmp w1, w23
Reporter | ||
Comment 2•7 years ago
|
||
Right now I am performing another fuzzing run with settings set to preserve all core dumps during the run, so we can get a developer to look at a core dump.
Updated•7 years ago
|
Whiteboard: [geckoview:crow]
Updated•7 years ago
|
Blocks: arm64-baseline
Comment 3•7 years ago
|
||
(In reply to Christian Holler (:decoder) from comment #2)
> Right now I am performing another fuzzing run with settings set to preserve
> all core dumps during the run, so we can get a developer to look at a core
> dump.
I've looked at some core dumps and there are a few categories:
(1) We segfault on some LDR instruction, but according to GDB everything is fine: the memory location can be accessed, is aligned correctly, contains valid data, etc. Maybe a CPU cache synchronization issue.
(2) We crash because of a moved-tenured/swept-tenured poison pattern or the js::gc::Relocated magic value.
One way forward is to disable the background tasks one by one (background finalization, compacting, off-thread parsing) and see if/when the crashes stop, and hopefully that will narrow it down more. I can post some patches for this.
Comment 4•7 years ago
|
||
This should run the GC helper tasks (except background sweeping) on the main thread.
If this stops the crashes, we can be more precise. If this doesn't stop the crashes, we can disable more helper tasks.
Attachment #8971939 -
Flags: feedback?(choller)
Reporter | ||
Comment 6•7 years ago
|
||
(In reply to Jason Orendorff [:jorendorff] from comment #5)
> Does this bug need to be hidden?
Most of the crashes I am seeing are s-s.
Comment 8•7 years ago
|
||
The crashes stopped when running all GCParallelTasks on the main thread.
Here's a similar patch, but we now allow UpdatePointersTask to run off-thread again, so we can see if the crashes return.
Attachment #8971939 -
Attachment is obsolete: true
Attachment #8971939 -
Flags: feedback?(choller)
Attachment #8972662 -
Flags: feedback?(choller)
Assignee | ||
Comment 9•7 years ago
|
||
I filed bug 1458839 to add some more assertions to the state management in GCParallelTask.
Updated•7 years ago
|
status-firefox59:
--- → wontfix
status-firefox60:
--- → wontfix
status-firefox-esr52:
--- → wontfix
status-firefox-esr60:
--- → affected
Updated•7 years ago
|
Keywords: csectype-uaf,
sec-high
Reporter | ||
Comment 10•7 years ago
|
||
Comment on attachment 8972662 [details] [diff] [review]
Patch v2
With this patch, the crashes return, so the problem is with the UpdatePointersTask.
Attachment #8972662 -
Flags: feedback?(choller) → feedback+
Reporter | ||
Comment 11•7 years ago
|
||
I gave :sewardj access to the machine and he managed to get Valgrind/Helgrind trunk to work on the shell there and he was able to obtain this log. It might contain false positives and we are currently rebuilding the shell with --disable-jemalloc to improve, but the log already shows a race at ~UpdatePointersTask. Maybe this is coincidence but I think it is worth investigating.
This is on m-c revision 2d83e1843241 with jandem's second patch from this bug.
Flags: needinfo?(jcoppeard)
Assignee | ||
Comment 12•7 years ago
|
||
(In reply to Christian Holler (:decoder) from comment #11)
The ~UpdatePointersTask race is a false positive as far as I can see. It's the race on vptr described here:
https://github.com/google/sanitizers/wiki/ThreadSanitizerPopularDataRaces#data-race-on-vptr
So the vptr is being updated on entry to the descructor. This is only a problem when calling an inherited destructor for a derived class. In this case UpdatePointersTask is the most derived class and this write will not change the value of the vptr.
We should still fix this if possible. I tried making the derived classes final but this didn't fix the problem in local testing.
There are a couple of other warnings in the log file that I will try to address too. I can't see that any of them could cause this crash though.
Flags: needinfo?(jcoppeard)
Comment 13•7 years ago
|
||
(In reply to Christian Holler (:decoder) from comment #1)
> Here is another crash trace that I see in dozens of variations, but always
> with a similar crash address:
> Program terminated with signal SIGSEGV, Segmentation fault.
> #0 js::Shape::updateBaseShapeAfterMovingGC (this=<optimized out>) at
> js/src/vm/Shape-inl.h:130
> #1 JS::Zone::fixupInitialShapeTable (this=this@entry=0xffff9ebea000) at
> js/src/vm/Shape.cpp:2240
(In reply to Jon Coppeard (:jonco) from comment #12)
> There are a couple of other warnings in the log file that I will try to
> address too. I can't see that any of them could cause this crash though.
Just after the middle of the Helgrind log file, there's a report of a race
between
updateEdge<js::Shape>, called from js::gc::MovingTracer::onShapeEdge (write, happens first)
and
shape, called from updateShapeAfterMovingGC (read, happens second)
Given that the read stack seems somewhat similar to the crash stack
(involving updateBaseShapeAfterMovingGC), could that be a relevant?
Flags: needinfo?(jcoppeard)
Assignee | ||
Comment 14•7 years ago
|
||
(In reply to Julian Seward [:jseward] from comment #13)
That's one of the other ones I'm looking at. I don't *think* this is a problem - the code should work whether the read observes
either the old or the new value - but I'm working on a patch for it anyway.
Flags: needinfo?(jcoppeard)
Assignee | ||
Comment 15•7 years ago
|
||
There's an off-by-one error in the way GCRuntime::updateCellPointers counts how many tasks it started. This leads to statistics for the final task not being recorded, but we still join on the task because that happens in UpdatePointersTask's destructor.
Attachment #8973252 -
Flags: review?(sphink)
Assignee | ||
Comment 16•7 years ago
|
||
I can see several races related to updating shapes (e.g. the one pointed out in comment 13).
What happens at the moment is that when we call MaybeForwarded for a JSObject* it will not only return the forwarded object pointer if the object is forwarded, but also update the object's shape if that has been forwarded. This was done so that you can actually use the object you get back without crashes when you do anything that touches its shape.
This is a race however: some trace hooks for an object make use of a different associated object (e.g. array buffer views access the array buffer object). So with this setup the shape can get updated in two places, once when the object itself is updated and once when updating another object that also references it.
It turns out we don't really need to update the shape though. The only place it's used is for checking the object's slot count in various debug assertions. We can call MaybeForwarded when reading the shape in these assertions instead.
Attachment #8973257 -
Flags: review?(sphink)
Assignee | ||
Comment 17•7 years ago
|
||
Patch to remove virtual GCParallelTask::run() and replace with function pointer.
This works around the benign race descrived in comment 12. Overall this turned out nicer than I expected, but it still kinda sucks to have to do this.
Attachment #8973258 -
Flags: review?(sphink)
Assignee | ||
Comment 18•7 years ago
|
||
I'm not confident that any of these patches will fix the underlying issue, but let's try it and see.
Assignee | ||
Comment 19•7 years ago
|
||
Comment on attachment 8973257 [details] [diff] [review]
compacting-update-races
Cancelling review due to crashes on try.
Attachment #8973257 -
Flags: review?(sphink)
Comment 20•7 years ago
|
||
Comment on attachment 8973252 [details] [diff] [review]
compacting-thread-count
Review of attachment 8973252 [details] [diff] [review]:
-----------------------------------------------------------------
Heh. Subtle.
Attachment #8973252 -
Flags: review?(sphink) → review+
Comment 21•7 years ago
|
||
Comment on attachment 8973258 [details] [diff] [review]
compacting-run-method
Review of attachment 8973258 [details] [diff] [review]:
-----------------------------------------------------------------
Ouch, unfortunate. I like your use of CRTP to keep it as clean as it can be, though.
Attachment #8973258 -
Flags: review?(sphink) → review+
Comment 22•7 years ago
|
||
Comment on attachment 8973257 [details] [diff] [review]
compacting-update-races
Review of attachment 8973257 [details] [diff] [review]:
-----------------------------------------------------------------
::: js/src/vm/NativeObject-inl.h
@@ +603,5 @@
> +NativeObject::slotSpanMaybeForwarded() const
> +{
> + Shape* shape = MaybeForwarded(lastProperty());
> + if (shape->inDictionary())
> + return shape->base()->slotSpan();
This hasn't changed AFAICT, but just so I understand -- why don't you need to do MaybeForwarded(shape->base())->slotSpan() here?
Assignee | ||
Comment 23•7 years ago
|
||
(In reply to Steve Fink [:sfink] [:s:] (PTO June 31) from comment #22)
> This hasn't changed AFAICT, but just so I understand -- why don't you need
> to do MaybeForwarded(shape->base())->slotSpan() here?
Shapes have already been updated by this point - see the comment here: https://searchfox.org/mozilla-central/source/js/src/gc/GC.cpp#2835
Assignee | ||
Comment 24•7 years ago
|
||
This changes makes ArrayBufferObject use get/setFixedSlot everywhere and changes the assertions in those methods to a version that works if the shape has been forwarded. I had to add another method to set the private given the number of fixed slots. This is green on try now.
Attachment #8973257 -
Attachment is obsolete: true
Attachment #8973940 -
Flags: review?(sphink)
Assignee | ||
Comment 25•7 years ago
|
||
Decoder, can you try testing with the attached patches to see if they make any difference? Hopefully helgrind will report fewer races at least.
Flags: needinfo?(choller)
Assignee | ||
Comment 26•7 years ago
|
||
Rebased patch.
Assignee: nobody → jcoppeard
Attachment #8973258 -
Attachment is obsolete: true
Attachment #8973970 -
Flags: review+
Comment 27•7 years ago
|
||
Jon, could these GC crashes affect x86 or 32-bit ARM platforms? This bug was filed as "ARM64 hardware only", but your patches don't appear to have any ARM64-specific code.
status-firefox62:
--- → affected
Component: JavaScript Engine → JavaScript: GC
Flags: needinfo?(jcoppeard)
Assignee | ||
Comment 28•7 years ago
|
||
(In reply to Chris Peterson [:cpeterson] from comment #27)
It's unlikely that they affect x86 and it's possible they affect 32bit ARM platforms. I don't know for sure what the root of the problem is but the weaker memory model on ARM is probably a factor.
Flags: needinfo?(jcoppeard)
Updated•7 years ago
|
Attachment #8973940 -
Flags: review?(sphink) → review+
Comment 29•7 years ago
|
||
(In reply to Jon Coppeard (:jonco) from comment #23)
> (In reply to Steve Fink [:sfink] [:s:] (PTO June 31) from comment #22)
> > This hasn't changed AFAICT, but just so I understand -- why don't you need
> > to do MaybeForwarded(shape->base())->slotSpan() here?
>
> Shapes have already been updated by this point - see the comment here:
> https://searchfox.org/mozilla-central/source/js/src/gc/GC.cpp#2835
Oh! Right, sorry, I was getting mixed up between pointers and pointees, and thinking "hey, you needed a forwarding check for the shape, so why not for the base shape that's updated in the same pass?"
Assignee | ||
Comment 30•7 years ago
|
||
Comment on attachment 8973252 [details] [diff] [review]
compacting-thread-count
[Security approval request comment]
Requesting approval to land these patches as even though they might not fix this problem, they are an improvement in general and reduce the number of possible races by the compacting code.
How easily could an exploit be constructed based on the patch? It's not clear that's possible so I'd say extremely difficult.
Do comments in the patch, the check-in comment, or tests included in the patch paint a bulls-eye on the security problem? No.
Which older supported branches are affected by this flaw? Everything back to FF 48.
If not all supported branches, which bug introduced the flaw? Bug 1257186.
Do you have backports for the affected branches? If not, how different, hard to create, and risky will they be? Should be trivial if necessary.
How likely is this patch to cause regressions; how much testing does it need? Unlikely.
Attachment #8973252 -
Flags: sec-approval?
Assignee | ||
Comment 31•7 years ago
|
||
Comment on attachment 8973940 [details] [diff] [review]
compacting-update-races v2
[Security approval request comment]
Again, this patch is not necessarily a fix but removes a possible source of races.
How easily could an exploit be constructed based on the patch? Very difficult.
Do comments in the patch, the check-in comment, or tests included in the patch paint a bulls-eye on the security problem? No.
Which older supported branches are affected by this flaw? Everything back to FF 48.
If not all supported branches, which bug introduced the flaw? Bug 1257903.
Do you have backports for the affected branches? If not, how different, hard to create, and risky will they be? Should be trivial if necessary.
How likely is this patch to cause regressions; how much testing does it need? Unlikely.
Attachment #8973940 -
Flags: sec-approval?
Assignee | ||
Comment 32•7 years ago
|
||
Comment on attachment 8973970 [details] [diff] [review]
compacting-run-method v2
[Security approval request comment]
Again, this removes another possible source of races.
How easily could an exploit be constructed based on the patch? Very difficult.
Do comments in the patch, the check-in comment, or tests included in the patch paint a bulls-eye on the security problem? No.
Which older supported branches are affected by this flaw? Everything back to FF 35.
If not all supported branches, which bug introduced the flaw? Bug 1064578.
Do you have backports for the affected branches? If not, how different, hard to create, and risky will they be? Should be trivial.
How likely is this patch to cause regressions; how much testing does it need? Unlikely.
Attachment #8973970 -
Flags: sec-approval?
Comment 33•7 years ago
|
||
sec-approval+ for landing these patches.
Updated•7 years ago
|
Attachment #8973252 -
Flags: sec-approval? → sec-approval+
Updated•7 years ago
|
Attachment #8973940 -
Flags: sec-approval? → sec-approval+
Updated•7 years ago
|
Attachment #8973970 -
Flags: sec-approval? → sec-approval+
Assignee | ||
Updated•7 years ago
|
Keywords: leave-open
Assignee | ||
Comment 34•7 years ago
|
||
Comment 35•7 years ago
|
||
Reporter | ||
Comment 36•7 years ago
|
||
This is an automated crash issue comment:
Summary: Crash [@ js::ShapeTable::searchUnchecked<(js::MaybeAdding)0>]
Build version: mozilla-central revision 21f09d7e7214
Build flags: --enable-optimize --enable-posix-nspr-emulation --enable-valgrind --enable-gczeal --disable-debug
Runtime options: --fuzzing-safe --cpu-count=2 --arm-asm-nop-fill=1 --arm-sim-icache-checks --test-wasm-await-tier2
Backtrace:
Program terminated with signal SIGSEGV, Segmentation fault.
#0 js::ShapeTable::searchUnchecked<(js::MaybeAdding)0> (id=..., this=<optimized out>) at js/src/vm/Shape-inl.h:292
#1 js::ShapeTable::search<(js::MaybeAdding)0> (id=..., this=<optimized out>) at js/src/vm/Shape-inl.h:371
#2 js::Shape::search<(js::MaybeAdding)0> (id=..., start=<optimized out>, cx=0xffffa4415000) at js/src/vm/Shape-inl.h:96
#3 js::Shape::search (id=..., cx=0xffffa4415000, this=<optimized out>) at js/src/vm/Shape-inl.h:48
#4 js::jit::SetNativeDataProperty<true> (cx=0xffffa4415000, obj=0x7000017e060, name=0x7000019c780, val=0xffffdf85c888) at js/src/jit/VMFunctions.cpp:1717
#5 0x000023c851a508d8 in ?? ()
x0 0x8 8
x1 0xbf88be52 3213409874
x2 0x0 0
x3 0x4b4b4b4b 5425512962855750475
x4 0xa44045b8 281473437418936
x5 0x0 0
x6 0x1 1
x7 0x0 0
x8 0xa4419c10 281473437506576
x9 0xa41f4280 281473435255424
x10 0x323ef0 3292912
x11 0x18 24
x12 0x18 24
x13 0x2598f2e 39423790
x14 0xb0000000 11884838885785600
x15 0x0 16777216000000000
x16 0x7d4bf0 8211440
x17 0x1fffc 131068
x18 0xa447b070 281473437905008
x19 0x1edd80 7696583417216
x20 0xf46000 16015360
x21 0x17e060 7696582959200
x22 0xdf85c888 281474431830152
x23 0x19c780 7696583083904
x24 0xa4415000 281473437487104
x25 0xef7000 15691776
x26 0xa3fea0b8 281473433116856
x27 0x0 0
x28 0xdf85c800 281474431830016
x29 0xdf85c7a0 281474431829920
x30 0x51a508d8 39343270201560
sp 0xdf85c7a0 281474431829920
pc 0x7d4c68 <js::jit::SetNativeDataProperty<true>(JSContext*, JSObject*, js::PropertyName*, JS::Value*)+120>
cpsr 0x80000000 -2147483648
fpcsr void
fpcr 0x0 0
=> 0x7d4c68 <js::jit::SetNativeDataProperty<true>(JSContext*, JSObject*, js::PropertyName*, JS::Value*)+120>: ldr w2, [x3]
0x7d4c6c <js::jit::SetNativeDataProperty<true>(JSContext*, JSObject*, js::PropertyName*, JS::Value*)+124>: ldr x5, [x3,#16]
Looks like there are still crashes, even with the patches here landed. I will try to find some test that we can run Helgrind on, maybe it shows something else.
Reporter | ||
Updated•7 years ago
|
Flags: needinfo?(choller)
Assignee | ||
Comment 37•7 years ago
|
||
(In reply to Christian Holler (:decoder) from comment #36)
> Looks like there are still crashes, even with the patches here landed.
Yes, the compacting-update-races patch has caused a couple more fuzz bugs. Overall it's still a simplification of our approach and worth doing though.
Reporter | ||
Comment 38•7 years ago
|
||
So I ran Helgrind on another random GC crashing test and caught this race report:
==18222== Possible data race during write of size 8 at 0x5DFF8E0 by thread #4
==18222== Locks held: none
==18222== at 0xDE4908: bitwiseOrRangeInto (Bitmap.h:53)
==18222== by 0xDE4908: AddBitmapToChunkMarkBits<js::DenseBitmap> (AtomMarking.cpp:125)
==18222== by 0xDE4908: js::gc::AtomMarkingRuntime::updateChunkMarkBits(JSRuntime*) (AtomMarking.cpp:148)
==18222== by 0xDF3BDF: UpdateAtomsBitmap(js::GCParallelTask*) (GC.cpp:5400)
==18222== by 0xAA8A1B: runTask (GCParallelTask.h:127)
==18222== by 0xAA8A1B: js::GCParallelTask::runFromHelperThread(js::AutoLockHelperThreadState&) (HelperThreads.cpp:1586)
==18222== by 0xAAEFD3: js::HelperThread::handleGCParallelWorkload(js::AutoLockHelperThreadState&) (HelperThreads.cpp:1612)
==18222== by 0xAA74EF: js::HelperThread::threadLoop() (HelperThreads.cpp:2385)
==18222== by 0xAA761B: js::HelperThread::ThreadMain(void*) (HelperThreads.cpp:1869)
==18222== by 0xABCC07: callMain<0ul> (Thread.h:242)
==18222== by 0xABCC07: js::detail::ThreadTrampoline<void (&)(void*), js::HelperThread*>::Start(void*) (Thread.h:235)
==18222== by 0x484BA37: mythread_wrapper (hg_intercepts.c:389)
==18222== by 0x486AFC3: start_thread (pthread_create.c:335)
==18222== by 0x4BDA2DF: thread_start (clone.S:89)
==18222==
==18222== This conflicts with a previous read of size 8 by thread #3
==18222== Locks held: none
==18222== at 0xBC7B04: markBit (Heap.h:611)
==18222== by 0xBC7B04: isMarkedAny (Heap.h:615)
==18222== by 0xBC7B04: js::gc::TenuredCell::isMarkedAny() const (Cell.h:286)
==18222== by 0xE3D96B: js::gc::IsAboutToBeFinalizedDuringSweep(js::gc::TenuredCell&) (Marking.cpp:3389)
==18222== by 0xE5F5AF: operator()<JSString> (Marking.cpp:3423)
==18222== by 0xE5F5AF: decltype ({parm#1}(static_cast<JSString*>((decltype(nullptr))0), (Forward<bool*>)({parm#3}))) js::DispatchTyped<IsAboutToBeFinalizedInternalFunctor<jsid>, bool*>(IsAboutToBeFinalizedInternalFunctor<jsid>, jsid const&, bool*&&) (Id.h:228)
==18222== by 0xE5F607: IsAboutToBeFinalizedInternal<jsid> (Marking.cpp:3433)
==18222== by 0xE5F607: bool js::gc::IsAboutToBeFinalizedUnbarriered<jsid>(jsid*) (Marking.cpp:3458)
==18222== by 0xB64D0B: needsSweep (ObjectGroup.cpp:1083)
==18222== by 0xB64D0B: needsSweep (GCPolicyAPI.h:87)
==18222== by 0xB64D0B: needsSweep (ObjectGroup.cpp:1772)
==18222== by 0xB64D0B: sweep (GCHashTable.h:83)
==18222== by 0xB64D0B: js::ObjectGroupCompartment::sweep() (ObjectGroup.cpp:1790)
==18222== by 0xDF5D17: SweepObjectGroups(js::GCParallelTask*) (GC.cpp:5422)
==18222== by 0xAA8A1B: runTask (GCParallelTask.h:127)
==18222== by 0xAA8A1B: js::GCParallelTask::runFromHelperThread(js::AutoLockHelperThreadState&) (HelperThreads.cpp:1586)
==18222== by 0xAAEFD3: js::HelperThread::handleGCParallelWorkload(js::AutoLockHelperThreadState&) (HelperThreads.cpp:1612)
==18222== Address 0x5dff8e0 is in a rw- anonymous segment
This is on m-c rev 45ec8fd380dd with a debug build. There was another race related to wasm that I will ask :bbouvier about because it didn't look related to this bug. Let me know if this looks related/helpful.
Flags: needinfo?(jcoppeard)
Comment 39•7 years ago
|
||
(In reply to Christian Holler (:decoder) from comment #38)
> ==18222== This conflicts with a previous read of size 8 by thread #3
> ==18222== Locks held: none
> ==18222== at 0xBC7B04: markBit (Heap.h:611)
I see that markBit() is marked MOZ_TSAN_BLACKLIST, so presumably it's a
known race. Is it harmless, though?
Assignee | ||
Comment 40•7 years ago
|
||
(In reply to Julian Seward [:jseward] from comment #39)
> I see that markBit() is marked MOZ_TSAN_BLACKLIST, so presumably it's a
> known race. Is it harmless, though?
The race that prompted this annotation was between gray unmarking a cell on the main thread (writing) and sweeping a different cell on a background thread (reading), where the mark bits for different cells end up in the same word.
This is a different issue. I'm going to spin this off into a separate bug.
Flags: needinfo?(jcoppeard)
Reporter | ||
Comment 41•7 years ago
|
||
I don't do this often... but https://imgflip.com/i/2abxi0
Assignee | ||
Comment 42•7 years ago
|
||
(In reply to Christian Holler (:decoder) from comment #41)
Bug 1424934 could remedy this situation.
Comment 43•7 years ago
|
||
I notice ARM64 specific JIT crashes in Bug 1461480 that fault at 1MB aligned addresses. Probably unrelated.
Comment 44•6 years ago
|
||
Jon, can you update on the status of this bug as it applies to non-trunk branches? Are these general GC issues that the ARM64 hardware just seems to trigger, an ARM64-only issue, or something else? Just trying to understand what if anything we need to do with respect to backports here.
Flags: needinfo?(jcoppeard)
Assignee | ||
Comment 45•6 years ago
|
||
(In reply to Ryan VanderMeulen [:RyanVM] from comment #44)
These are not ARM64-specific issues so we should backport.
Flags: needinfo?(jcoppeard)
Assignee | ||
Comment 46•6 years ago
|
||
(In reply to Christian Holler (:decoder) from comment #38)
I filed bug 1461619 for race reported by Helgrind in comment 38.
Assignee | ||
Comment 47•6 years ago
|
||
Comment on attachment 8973252 [details] [diff] [review]
compacting-thread-count
Approval Request Comment
[Feature/Bug causing the regression]: Bug 1257186.
[User impact if declined]: Possible crash / security vulnerability.
[Is this code covered by automated tests?]: Yes.
[Has the fix been verified in Nightly?]: Yes.
[Needs manual test from QE? If yes, steps to reproduce]: No.
[List of other uplifts needed for the feature/fix]: None.
[Is the change risky?]: No.
[Why is the change risky/not risky?]: This is a very simple change and is covered by assertions in the patch.
[String changes made/needed]: None.
Attachment #8973252 -
Flags: approval-mozilla-beta?
Assignee | ||
Comment 48•6 years ago
|
||
Comment on attachment 8973970 [details] [diff] [review]
compacting-run-method v2
Approval Request Comment
[Feature/Bug causing the regression]: Bug 1064578.
[User impact if declined]: Possible crash / security vulnerability.
[Is this code covered by automated tests?]: Yes.
[Has the fix been verified in Nightly?]: Yes.
[Needs manual test from QE? If yes, steps to reproduce]: No.
[List of other uplifts needed for the feature/fix]: None.
[Is the change risky?]: Low risk.
[Why is the change risky/not risky?]: This is a fairly mechanical change that only affects the GC.
[String changes made/needed]: None.
Attachment #8973970 -
Flags: approval-mozilla-beta?
Assignee | ||
Comment 49•6 years ago
|
||
(In reply to Jon Coppeard (:jonco) from comment #45)
I'm not sure whether we should backport the compacting-update-races patch. This is a complicated change that has already required a couple of fixes. Given that we didn't have any evidence that this was causing problems, I'm inclined to let it ride. The other patches are much simpler and are worth backporting just in case.
Assignee | ||
Comment 50•6 years ago
|
||
Comment on attachment 8973970 [details] [diff] [review]
compacting-run-method v2
Cancelling approval request and moving to bug 1465108.
Attachment #8973970 -
Flags: approval-mozilla-beta?
Assignee | ||
Comment 51•6 years ago
|
||
Comment on attachment 8973252 [details] [diff] [review]
compacting-thread-count
Cancelling approval request and moving to bug 1465108.
Attachment #8973252 -
Flags: approval-mozilla-beta?
Updated•6 years ago
|
Attachment #8973252 -
Flags: checkin+
Updated•6 years ago
|
Attachment #8973940 -
Flags: checkin+
Updated•6 years ago
|
Attachment #8973970 -
Flags: checkin+
Comment 52•6 years ago
|
||
This bug is getting to be difficult to track given how patches are landing. I'm going to propose that we keep this bug open as a meta bug at this point and file new bugs blocking this one for any subsequent patches that need to land. Also dropping tracking+ status for the various releases for that reason (we can tracking+ the new bugs as they come in).
Christian, how are things looking these days?
tracking-firefox-esr60:
61+ → ---
Flags: needinfo?(choller)
Reporter | ||
Comment 53•6 years ago
|
||
I'm still seeing the random GC crashes unfortunately (see comment 36). I'm trying to get TSan to work and use Valgrind/Helgrind on more of the non-reproducing tests, but I hit difficulties with both tools due to the architecture: TSan is currently just hanging on startup (I've contacted the developers about this) and Helgrind is either not detecting new races or also hanging in the middle of some tests. I'll ask Julien for help as well.
Flags: needinfo?(choller)
Reporter | ||
Comment 54•6 years ago
|
||
I mentioned this to some people before, but I feel like it should be documented here. In conjunction with the crashes, I see a lot of these in dmesg:
[4838736.964530] Unhandled fault: TLB conflict abort (0x92000030) at 0x0000ffff98042000
Not sure if this is the result of us crashing or the cause.
Assignee | ||
Comment 55•6 years ago
|
||
(In reply to Christian Holler (:decoder) from comment #54)
> [4838736.964530] Unhandled fault: TLB conflict abort (0x92000030) at
> 0x0000ffff98042000
It sounds like this is due to incorrect TLB maintenance, so more of an OS-level problem.
Marking this fix-optional for 62 to remove it from triage, since there are crashes visible in crash-stats for 62 with this signature in the last month, and this is kind of a placeholder/meta bug now. If anything comes up and we have a patch for 62/63, it should show up in a dependent bug.
Assignee | ||
Comment 57•6 years ago
|
||
I'm closing this as invalid based on comment 55.
Status: NEW → RESOLVED
Closed: 6 years ago
Resolution: --- → INVALID
Updated•6 years ago
|
Keywords: leave-open
Updated•5 years ago
|
Updated•5 years ago
|
Group: javascript-core-security
You need to log in
before you can comment on or make changes to this bug.
Description
•