Closed Bug 1741182 Opened 3 years ago Closed 3 years ago

ReleaseWorkerRunnable control runnable does not call superclass WorkerRunnable::Cancel. [Assertion failure: IsCanceled() (Subclass Cancel() didn't set IsCanceled()!), at /dom/workers/WorkerRunnable.cpp:253]

Categories

(Core :: DOM: Workers, defect, P3)

x86_64
Linux
defect

Tracking

()

VERIFIED FIXED
98 Branch
Tracking Status
firefox-esr91 --- unaffected
firefox96 --- wontfix
firefox97 --- wontfix
firefox98 --- verified

People

(Reporter: jkratzer, Assigned: jstutte)

References

(Blocks 1 open bug, Regression)

Details

(Keywords: regression, testcase, Whiteboard: [bugmon:bisected,confirmed])

Attachments

(2 files)

Testcase found while fuzzing mozilla-central rev 0ea31fd939c8 (built with: --enable-debug --enable-fuzzing).

Testcase can be reproduced using the following commands:

$ pip install fuzzfetch grizzly-framework
$ python -m fuzzfetch --build 0ea31fd939c8 --debug --fuzzing -n firefox
$ python -m grizzly.replay ./firefox/firefox testcase.zip --repeat 2
Assertion failure: IsCanceled() (Subclass Cancel() didn't set IsCanceled()!), at /dom/workers/WorkerRunnable.cpp:253

    ==913634==ERROR: UndefinedBehaviorSanitizer: SEGV on unknown address 0x000000000000 (pc 0x7f31f375f096 bp 0x7f31e4dcf340 sp 0x7f31e4dcf1b0 T913757)
    ==913634==The signal is caused by a WRITE memory access.
    ==913634==Hint: address points to the zero page.
        #0 0x7f31f375f096 in mozilla::dom::WorkerRunnable::Run() /dom/workers/WorkerRunnable.cpp:253:5
        #1 0x7f31f374ec91 in mozilla::dom::WorkerPrivate::ProcessAllControlRunnablesLocked() /dom/workers/WorkerPrivate.cpp:3677:9
        #2 0x7f31f374fc13 in ProcessAllControlRunnables /builds/worker/workspace/obj-build/dist/include/mozilla/dom/WorkerPrivate.h:1050:12
        #3 0x7f31f374fc13 in mozilla::dom::WorkerPrivate::OnProcessNextEvent() /dom/workers/WorkerPrivate.cpp:3175:15
        #4 0x7f31f3769f6b in mozilla::dom::WorkerThread::Observer::OnProcessNextEvent(nsIThreadInternal*, bool) /dom/workers/WorkerThread.cpp:364:19
        #5 0x7f31ef2123e0 in nsThread::ProcessNextEvent(bool, bool*) /xpcom/threads/nsThread.cpp:1094:3
        #6 0x7f31ef20f134 in NS_ProcessPendingEvents(nsIThread*, unsigned int) /xpcom/threads/nsThreadUtils.cpp:432:19
        #7 0x7f31f3751858 in mozilla::dom::WorkerPrivate::ClearMainEventQueue(mozilla::dom::WorkerPrivate::WorkerRanOrNot) /dom/workers/WorkerPrivate.cpp:3720:5
        #8 0x7f31f374f04b in mozilla::dom::WorkerPrivate::NotifyInternal(mozilla::dom::WorkerStatus) /dom/workers/WorkerPrivate.cpp:4533:7
        #9 0x7f31f375ecac in mozilla::dom::WorkerRunnable::Run() /dom/workers/WorkerRunnable.cpp:378:12
        #10 0x7f31f374ec91 in mozilla::dom::WorkerPrivate::ProcessAllControlRunnablesLocked() /dom/workers/WorkerPrivate.cpp:3677:9
        #11 0x7f31f374df07 in mozilla::dom::WorkerPrivate::DoRunLoop(JSContext*) /dom/workers/WorkerPrivate.cpp:3004:21
        #12 0x7f31f372e407 in mozilla::dom::workerinternals::(anonymous namespace)::WorkerThreadPrimaryRunnable::Run() /dom/workers/RuntimeService.cpp:2244:42
        #13 0x7f31ef212879 in nsThread::ProcessNextEvent(bool, bool*) /xpcom/threads/nsThread.cpp:1169:16
        #14 0x7f31ef21999a in NS_ProcessNextEvent(nsIThread*, bool) /xpcom/threads/nsThreadUtils.cpp:467:10
        #15 0x7f31efca7e8b in mozilla::ipc::MessagePumpForNonMainThreads::Run(base::MessagePump::Delegate*) /ipc/glue/MessagePump.cpp:300:20
        #16 0x7f31efbc6307 in MessageLoop::RunInternal() /ipc/chromium/src/base/message_loop.cc:331:10
        #17 0x7f31efbc6212 in RunHandler /ipc/chromium/src/base/message_loop.cc:324:3
        #18 0x7f31efbc6212 in MessageLoop::Run() /ipc/chromium/src/base/message_loop.cc:306:3
        #19 0x7f31ef20e4eb in nsThread::ThreadFunc(void*) /xpcom/threads/nsThread.cpp:391:10
        #20 0x7f32043c6a07 in _pt_root /nsprpub/pr/src/pthreads/ptthread.c:201:5
        #21 0x7f320513a608 in start_thread /build/glibc-eX1tMB/glibc-2.31/nptl/pthread_create.c:477:8
        #22 0x7f3204d02292 in __clone /build/glibc-eX1tMB/glibc-2.31/misc/../sysdeps/unix/sysv/linux/x86_64/clone.S:95
    
    UndefinedBehaviorSanitizer can not provide additional info.
    SUMMARY: UndefinedBehaviorSanitizer: SEGV /dom/workers/WorkerRunnable.cpp:253:5 in mozilla::dom::WorkerRunnable::Run()
    ==913634==ABORTING
Attached file Testcase (deleted) —

Bugmon Analysis
Verified bug as reproducible on mozilla-central 20211115093917-0ea31fd939c8.
The bug appears to have been introduced in the following build range:

Start: 9b2e412995e62775bbc37a013354a6c964e25e69 (20210929123904)
End: 68940497078c0bd6d8101a180f98a686bf9a78c3 (20210929130821)
Pushlog: https://hg.mozilla.org/integration/autoland/pushloghtml?fromchange=9b2e412995e62775bbc37a013354a6c964e25e69&tochange=68940497078c0bd6d8101a180f98a686bf9a78c3

Whiteboard: [bugmon:confirm] → [bugmon:bisected,confirmed]

That pushlog points to bug 1722576.

Flags: needinfo?(evilpies)
Severity: -- → S3
Priority: -- → P3

So far I can't reproduce this assertion. Is it possible that this test case doesn't actually depend on structuredClone?

Flags: needinfo?(evilpies)

Tom, the call to structuredClone() is required for me to trigger the bug. You may have better luck reproducing the issue by increasing the repeat count like so:

python -m grizzly.replay ./firefox/firefox testcase.zip --repeat 100 --relaunch 2

I can't reproduce this. Maybe some should try to investigate.

[2021-11-23 20:21:33] Running test (100/100)...
[2021-11-23 20:21:36] Failed to reproduce results

(In reply to Jason Kratzer [:jkratzer] from comment #5)

Tom, the call to structuredClone() is required for me to trigger the bug. You may have better luck reproducing the issue by increasing the repeat count like so:

python -m grizzly.replay ./firefox/firefox testcase.zip --repeat 100 --relaunch 2

Jason, does this still reproduce for you? And I assume you are already trying to reproduce this with pernosco? Thanks!

Flags: needinfo?(jkratzer)

(In reply to Jens Stutte [:jstutte] from comment #7)

(In reply to Jason Kratzer [:jkratzer] from comment #5)

Tom, the call to structuredClone() is required for me to trigger the bug. You may have better luck reproducing the issue by increasing the repeat count like so:

python -m grizzly.replay ./firefox/firefox testcase.zip --repeat 100 --relaunch 2

Jason, does this still reproduce for you? And I assume you are already trying to reproduce this with pernosco? Thanks!

Jens, yes - this still reproduces for me on m-c 20211129-d03f87555639. I'm trying to get a pernosco session but so far no luck.

Flags: needinfo?(jkratzer) → needinfo?(evilpies)

You successfully logged in, but either you are not authorized to view this trace OR the debugging database for this trace has expired (typically 7 days after the trace was collected) and needs to be rebuilt.

Flags: needinfo?(evilpies)

(In reply to Tom Schuster [:evilpie] from comment #10)

You successfully logged in, but either you are not authorized to view this trace OR the debugging database for this trace has expired (typically 7 days after the trace was collected) and needs to be rebuilt.

Please try this: https://github.com/Pernosco/pernosco/wiki/Login-Troubleshooting

Flags: needinfo?(evilpies)
Flags: needinfo?(evilpies)

I don't have a @mozilla account obviously, so I was a bit hesitant to do this. But now that we are also getting bug 1749002 on try we should do something. Can someone who actually know worker code look at this and see if bug 1749002 is related as well?

Flags: needinfo?(jstutte)

Sorry, I did not think about the access to pernosco.

Eden, can you help to take a look?

Flags: needinfo?(jstutte) → needinfo?(echuang)

The pernosco trace shows that the WeakWorkerRef ReleaseWorkerRunnable::Cancel is failing to call the WorkerRunnable::Cancel like PerformanceEntryAdded does, for example.

Note that it continues to be nonsensical that WorkerControlRunnables can be canceled, but that's not something we're going to fix in this bug (and I think there's an existing bug).

Summary: Assertion failure: IsCanceled() (Subclass Cancel() didn't set IsCanceled()!), at /dom/workers/WorkerRunnable.cpp:253 → ReleaseWorkerRunnable control runnable does not call superclass WorkerRunnable::Cancel. [Assertion failure: IsCanceled() (Subclass Cancel() didn't set IsCanceled()!), at /dom/workers/WorkerRunnable.cpp:253]

(In reply to Andrew Sutherland [:asuth] (he/him) from comment #14)

The pernosco trace shows that the WeakWorkerRef ReleaseWorkerRunnable::Cancel is failing to call the WorkerRunnable::Cancel like PerformanceEntryAdded does, for example.

Would this apply also to the CrashIfHangingRunnable ?

Flags: needinfo?(echuang) → needinfo?(bugmail)

(In reply to Jens Stutte [:jstutte] from comment #15)

Would this apply also to the CrashIfHangingRunnable ?

Yes.

Flags: needinfo?(bugmail)
Assignee: nobody → jstutte
Status: NEW → ASSIGNED

This seemed straight forward enough to just do it, but while doing so I noticed that we need to be more careful about the order inside Cancel also where we already called the base class' function, that is we need to ensure that the base class' function is called first and bail out in case. I hope I understood this right (see patch).

:jstutte, since this bug contains a bisection range, could you fill (if possible) the regressed_by field?
For more information, please visit auto_nag documentation.

Flags: needinfo?(jstutte)

(In reply to Release mgmt bot [:marco/ :calixte] from comment #19)

:jstutte, since this bug contains a bisection range, could you fill (if possible) the regressed_by field?
For more information, please visit auto_nag documentation.

Actually it is a bit unfair to say, that bug 1722576 regressed this. It just happened to implement a piece of API that the fuzzer then used, but the underlying issue was always there and could have been triggered differently, I assume...

Flags: needinfo?(jstutte)
Regressed by: 1722576
Has Regression Range: --- → yes
Keywords: regression
Pushed by jstutte@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/9210bf384d60 Harmonize WorkerRunnable derived classes' overrides of Cancel. r=dom-worker-reviewers,smaug
Status: ASSIGNED → RESOLVED
Closed: 3 years ago
Resolution: --- → FIXED
Target Milestone: --- → 98 Branch

Bugmon Analysis
Verified bug as fixed on rev mozilla-central 20220112213002-38711fbec2b1.
Removing bugmon keyword as no further action possible. Please review the bug and re-add the keyword for further analysis.

Status: RESOLVED → VERIFIED
Keywords: bugmon

Set release status flags based on info from the regressing bug 1722576

You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: