Closed Bug 1637329 Opened 5 years ago Closed 4 years ago

Add support for shimming resources blocked by Tracking Protection

Categories

(Web Compatibility :: Interventions, enhancement, P1)

enhancement

Tracking

(firefox81 fixed)

RESOLVED FIXED
Tracking Status
firefox81 --- fixed

People

(Reporter: miketaylr, Assigned: twisniewski)

References

(Depends on 1 open bug, Blocks 3 open bugs)

Details

Attachments

(1 file)

In order to minimize user-facing site bustage when Strict Tracking Protection is enabled, we would like to have the ability to "shim" certain resources or popular SDKs (for logins, etc).

Depends on: 1639718
Blocks: 1646175
No longer blocks: 1516552
Pushed by twisniewski@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/b0b40eebd9ab add support for shimming resources blocking by tracking protection to webcompat built-in addon; r=dimi,robwu,webcompat-reviewers,denschub,miketaylr

robwu, jwatt, I have literally no idea what is going on with the permafailure shown in comment 3 here. Any ideas? I was hoping to land this before I went on PTO (because of the soft-freeze while I'm away), but I would rather not just blindly disable the shims this patch adds when e10s is off, without knowing what the error actually is. Why would this test get even worse with my patch, and does it indicate a deeper problem we might need to address? (I see the test was already disabled on Windows recently, so I have to wonder). Or if it's somehow deemed alright to land my patch and just disable that test for now, would either of you mind doing that? (I'll be away from my office in a few hours).

Flags: needinfo?(twisniewski)
Flags: needinfo?(rob)
Flags: needinfo?(jwatt)

The stack of the crash from comment 3 points to the Localization destructor, and I don't see an obvious relation between your patch and that. I don't even see retriggers, so I don't know why it is classified as a permafail. I clicked on "Retrigger job" to retrigger the job, and the test passes, so it is clearly not a permafail: https://treeherder.mozilla.org/#/jobs?repo=autoland&group_state=expanded&selectedTaskRun=Ft6WVGk-S5qP4d6j0Hk_qA.0&revision=b0b40eebd9abf2748f85d7c700a2ced36035b14a&searchStr=os%2Cx%2C10.14%2Cdebug%2Cmochitests%2Cwithout%2Ce10s%2Ctest-macosx1014-64%2Fdebug-mochitest-chrome-1proc%2Cc2

The test is already disabled on non-e10s Linux and soon Windows (bug 1583315)., so I think that the test is just unreliable. I suggest to try to reland it.

For posterity, the stack of the crash is:

INFO - GECKO(9300) | Assertion failure: !mNext (Must be unlinked), at /builds/worker/checkouts/gecko/js/xpconnect/src/xpcprivate.h:266
INFO - GECKO(9300) | #01: nsXPCWrappedJS::~nsXPCWrappedJS() [js/xpconnect/src/XPCWrappedJS.cpp:430]
INFO - GECKO(9300) | #02: nsXPCWrappedJS::Release() [js/xpconnect/src/XPCWrappedJS.cpp:271]
INFO - GECKO(9300) | #03: nsXPCWrappedJS::Release() [js/xpconnect/src/XPCWrappedJS.cpp:286]
INFO - GECKO(9300) | #04: mozilla::intl::Localization::~Localization() [intl/l10n/Localization.cpp:146]
INFO - GECKO(9300) | #05: mozilla::intl::Localization::~Localization() [intl/l10n/Localization.cpp:136]
INFO - GECKO(9300) | #06: SuspectAfterShutdown(void*, nsCycleCollectionParticipant*, nsCycleCollectingAutoRefCnt*, bool*) [xpcom/base/nsCycleCollector.cpp:3741]
INFO - GECKO(9300) | #07: mozilla::intl::Localization::Release() [intl/l10n/Localization.cpp:44]
INFO - GECKO(9300) | #08: mozilla::SegmentedVector<nsCOMPtr<nsISupports>, 4096ul, mozilla::MallocAllocPolicy>::SegmentImpl<509ul>::~SegmentImpl() [mfbt/SegmentedVector.h:76]
INFO - GECKO(9300) | #09: mozilla::SegmentedVector<nsCOMPtr<nsISupports>, 4096ul, mozilla::MallocAllocPolicy>::PopLastN(unsigned int) [mfbt/SegmentedVector.h:247]
INFO - GECKO(9300) | #10: mozilla::dom::DeferredFinalizerImpl<nsISupports>::DeferredFinalize(unsigned int, void*) [dom/bindings/BindingUtils.h:2731]
INFO - GECKO(9300) | #11: mozilla::IncrementalFinalizeRunnable::ReleaseNow(bool) [xpcom/base/CycleCollectedJSRuntime.cpp:1553]
INFO - GECKO(9300) | #12: mozilla::CycleCollectedJSRuntime::FinalizeDeferredThings(mozilla::CycleCollectedJSContext::DeferredFinalizeType) [xpcom/base/CycleCollectedJSRuntime.cpp:1627]
INFO - GECKO(9300) | #13: mozilla::CycleCollectedJSRuntime::OnGC(JSContext*, JSGCStatus, JS::GCReason) [xpcom/base/CycleCollectedJSRuntime.cpp:1706]
INFO - GECKO(9300) | #14: js::gc::GCRuntime::maybeCallGCCallback(JSGCStatus, JS::GCReason) [js/src/gc/GC.cpp:7033]
INFO - GECKO(9300) | #15: js::gc::GCRuntime::gcCycle(bool, js::SliceBudget, mozilla::Maybe<JSGCInvocationKind> const&, JS::GCReason) [js/src/gc/GC.cpp:7124]
INFO - GECKO(9300) | #16: js::gc::GCRuntime::collect(bool, js::SliceBudget, mozilla::Maybe<JSGCInvocationKind> const&, JS::GCReason) [js/src/gc/GC.cpp:7347]
INFO - GECKO(9300) | #17: js::gc::GCRuntime::gc(JSGCInvocationKind, JS::GCReason) [js/src/gc/GC.cpp:7423]
INFO - GECKO(9300) | #18: JSRuntime::destroyRuntime() [js/src/vm/Runtime.cpp:290]
INFO - GECKO(9300) | #19: js::DestroyContext(JSContext*) [js/src/vm/JSContext.cpp:214]
INFO - GECKO(9300) | #20: mozilla::CycleCollectedJSContext::~CycleCollectedJSContext() [xpcom/base/CycleCollectedJSContext.cpp:109]
INFO - GECKO(9300) | #21: XPCJSContext::~XPCJSContext() [js/xpconnect/src/XPCJSContext.cpp:1063]
INFO - GECKO(9300) | #22: XPCJSContext::~XPCJSContext() [js/xpconnect/src/XPCJSContext.cpp:1027]
INFO - GECKO(9300) | #23: nsXPConnect::~nsXPConnect() [js/xpconnect/src/nsXPConnect.cpp:128]
INFO - GECKO(9300) | #24: nsXPConnect::~nsXPConnect() [js/xpconnect/src/nsXPConnect.cpp:97]
INFO - GECKO(9300) | #25: nsXPConnect::ReleaseXPConnectSingleton() [js/xpconnect/src/nsXPConnect.cpp:164]
INFO - GECKO(9300) | #26: nsComponentManagerImpl::Shutdown() [xpcom/components/nsComponentManager.cpp:933]
INFO - GECKO(9300) | #27: mozilla::ShutdownXPCOM(nsIServiceManager*) [xpcom/build/XPCOMInit.cpp:737]
INFO - GECKO(9300) | #28: ScopedXPCOMStartup::~ScopedXPCOMStartup() [toolkit/xre/nsAppRunner.cpp:1286]
INFO - GECKO(9300) | #29: XREMain::XRE_main(int, char**, mozilla::BootstrapConfig const&) [toolkit/xre/nsAppRunner.cpp:4926]
INFO - GECKO(9300) | #30: XRE_main(int, char**, mozilla::BootstrapConfig const&) [toolkit/xre/nsAppRunner.cpp:4963]
INFO - GECKO(9300) | #31: main [/Users/cltbld/tasks/task_1595082452/build/application/Firefox NightlyDebug.app/Contents/MacOS/firefox + 0x1624]
INFO - Not taking screenshot here: see the one that was previously logged
INFO - TEST-UNEXPECTED-FAIL | browser/components/shell/test/test_headless_screenshot.html | Firefox process should exit with code 0 - got 1, expected +0
Flags: needinfo?(rob)

Ok, I've queued it up for landing, hopefully it sticks this time. (Sorry otherwise!)

Pushed by twisniewski@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/fc5242f4332d add support for shimming resources blocking by tracking protection to webcompat built-in addon; r=dimi,robwu,webcompat-reviewers,denschub,miketaylr

Backed out for assertion failures on xpcprivate.h

backout: https://hg.mozilla.org/integration/autoland/rev/015515bcba1f358432cad988600672d027fd125d

push: https://treeherder.mozilla.org/#/jobs?repo=autoland&group_state=expanded&searchStr=os%2Cx%2C10.14%2Cdebug%2Cmochitests%2Cwithout%2Ce10s%2Ctest-macosx1014-64%2Fdebug-mochitest-chrome-1proc%2Cc2&revision=fc5242f4332df3d6abe1ea79c23b94f05b13c2e2&selectedTaskRun=SaBkEoTjQVGIcWO5c8qhFQ.0

failure log: https://treeherder.mozilla.org/logviewer.html#/jobs?job_id=310325896&repo=autoland&lineNumber=1390

[task 2020-07-19T23:55:29.208Z] 23:55:29 INFO - GECKO(1159) | Assertion failure: !mNext (Must be unlinked), at /builds/worker/checkouts/gecko/js/xpconnect/src/xpcprivate.h:266
[task 2020-07-19T23:55:29.224Z] 23:55:29 INFO - Initializing stack-fixing for the first stack frame, this may take a while...
[task 2020-07-19T23:55:49.707Z] 23:55:49 INFO - GECKO(1159) | #01: nsXPCWrappedJS::~nsXPCWrappedJS() [js/xpconnect/src/XPCWrappedJS.cpp:430]
[task 2020-07-19T23:55:49.708Z] 23:55:49 INFO - GECKO(1159) | #02: nsXPCWrappedJS::Release() [js/xpconnect/src/XPCWrappedJS.cpp:271]
[task 2020-07-19T23:55:49.708Z] 23:55:49 INFO - GECKO(1159) | #03: nsXPCWrappedJS::Release() [js/xpconnect/src/XPCWrappedJS.cpp:286]
[task 2020-07-19T23:55:49.709Z] 23:55:49 INFO - GECKO(1159) | #04: mozilla::intl::Localization::~Localization() [intl/l10n/Localization.cpp:146]
[task 2020-07-19T23:55:49.710Z] 23:55:49 INFO - GECKO(1159) | #05: mozilla::intl::Localization::~Localization() [intl/l10n/Localization.cpp:136]
[task 2020-07-19T23:55:49.710Z] 23:55:49 INFO - GECKO(1159) | #06: SuspectAfterShutdown(void*, nsCycleCollectionParticipant*, nsCycleCollectingAutoRefCnt*, bool*) [xpcom/base/nsCycleCollector.cpp:3741]
[task 2020-07-19T23:55:49.710Z] 23:55:49 INFO - GECKO(1159) | #07: mozilla::intl::Localization::Release() [intl/l10n/Localization.cpp:44]
[task 2020-07-19T23:55:49.711Z] 23:55:49 INFO - GECKO(1159) | #08: mozilla::SegmentedVector<nsCOMPtr<nsISupports>, 4096ul, mozilla::MallocAllocPolicy>::SegmentImpl<509ul>::~SegmentImpl() [mfbt/SegmentedVector.h:76]
[task 2020-07-19T23:55:49.711Z] 23:55:49 INFO - GECKO(1159) | #09: mozilla::SegmentedVector<nsCOMPtr<nsISupports>, 4096ul, mozilla::MallocAllocPolicy>::PopLastN(unsigned int) [mfbt/SegmentedVector.h:247]
[task 2020-07-19T23:55:49.715Z] 23:55:49 INFO - GECKO(1159) | #10: mozilla::dom::DeferredFinalizerImpl<nsISupports>::DeferredFinalize(unsigned int, void*) [dom/bindings/BindingUtils.h:2731]
[task 2020-07-19T23:55:49.716Z] 23:55:49 INFO - GECKO(1159) | #11: mozilla::IncrementalFinalizeRunnable::ReleaseNow(bool) [xpcom/base/CycleCollectedJSRuntime.cpp:1553]
[task 2020-07-19T23:55:49.716Z] 23:55:49 INFO - GECKO(1159) | #12: mozilla::CycleCollectedJSRuntime::FinalizeDeferredThings(mozilla::CycleCollectedJSContext::DeferredFinalizeType) [xpcom/base/CycleCollectedJSRuntime.cpp:1627]
[task 2020-07-19T23:55:49.716Z] 23:55:49 INFO - GECKO(1159) | #13: mozilla::CycleCollectedJSRuntime::OnGC(JSContext*, JSGCStatus, JS::GCReason) [xpcom/base/CycleCollectedJSRuntime.cpp:1706]
[task 2020-07-19T23:55:49.716Z] 23:55:49 INFO - GECKO(1159) | #14: js::gc::GCRuntime::maybeCallGCCallback(JSGCStatus, JS::GCReason) [js/src/gc/GC.cpp:7033]
[task 2020-07-19T23:55:49.717Z] 23:55:49 INFO - GECKO(1159) | #15: js::gc::GCRuntime::gcCycle(bool, js::SliceBudget, mozilla::Maybe<JSGCInvocationKind> const&, JS::GCReason) [js/src/gc/GC.cpp:7124]
[task 2020-07-19T23:55:49.717Z] 23:55:49 INFO - GECKO(1159) | #16: js::gc::GCRuntime::collect(bool, js::SliceBudget, mozilla::Maybe<JSGCInvocationKind> const&, JS::GCReason) [js/src/gc/GC.cpp:7347]
[task 2020-07-19T23:55:49.717Z] 23:55:49 INFO - GECKO(1159) | #17: js::gc::GCRuntime::gc(JSGCInvocationKind, JS::GCReason) [js/src/gc/GC.cpp:7423]
[task 2020-07-19T23:55:49.717Z] 23:55:49 INFO - GECKO(1159) | #18: JSRuntime::destroyRuntime() [js/src/vm/Runtime.cpp:290]
[task 2020-07-19T23:55:49.718Z] 23:55:49 INFO - GECKO(1159) | #19: js::DestroyContext(JSContext*) [js/src/vm/JSContext.cpp:214]
[task 2020-07-19T23:55:49.718Z] 23:55:49 INFO - GECKO(1159) | #20: mozilla::CycleCollectedJSContext::~CycleCollectedJSContext() [xpcom/base/CycleCollectedJSContext.cpp:109]
[task 2020-07-19T23:55:49.718Z] 23:55:49 INFO - GECKO(1159) | #21: XPCJSContext::~XPCJSContext() [js/xpconnect/src/XPCJSContext.cpp:1063]
[task 2020-07-19T23:55:49.718Z] 23:55:49 INFO - GECKO(1159) | #22: XPCJSContext::~XPCJSContext() [js/xpconnect/src/XPCJSContext.cpp:1027]
[task 2020-07-19T23:55:49.718Z] 23:55:49 INFO - GECKO(1159) | #23: nsXPConnect::~nsXPConnect() [js/xpconnect/src/nsXPConnect.cpp:128]
[task 2020-07-19T23:55:49.718Z] 23:55:49 INFO - GECKO(1159) | #24: nsXPConnect::~nsXPConnect() [js/xpconnect/src/nsXPConnect.cpp:97]
[task 2020-07-19T23:55:49.719Z] 23:55:49 INFO - GECKO(1159) | #25: nsXPConnect::ReleaseXPConnectSingleton() [js/xpconnect/src/nsXPConnect.cpp:164]
[task 2020-07-19T23:55:49.719Z] 23:55:49 INFO - GECKO(1159) | #26: nsComponentManagerImpl::Shutdown() [xpcom/components/nsComponentManager.cpp:933]
[task 2020-07-19T23:55:49.719Z] 23:55:49 INFO - GECKO(1159) | #27: mozilla::ShutdownXPCOM(nsIServiceManager*) [xpcom/build/XPCOMInit.cpp:737]
[task 2020-07-19T23:55:49.719Z] 23:55:49 INFO - GECKO(1159) | #28: ScopedXPCOMStartup::~ScopedXPCOMStartup() [toolkit/xre/nsAppRunner.cpp:1286]
[task 2020-07-19T23:55:49.721Z] 23:55:49 INFO - GECKO(1159) | #29: XREMain::XRE_main(int, char**, mozilla::BootstrapConfig const&) [toolkit/xre/nsAppRunner.cpp:4926]
[task 2020-07-19T23:55:49.722Z] 23:55:49 INFO - GECKO(1159) | #30: XRE_main(int, char**, mozilla::BootstrapConfig const&) [toolkit/xre/nsAppRunner.cpp:4963]
[task 2020-07-19T23:55:49.722Z] 23:55:49 INFO - fix-stacks error: failed to read breakpad symbols dir /Users/cltbld/tasks/task_1595202714/build/symbols/firefox for /Users/cltbld/tasks/task_1595202714/build/application/Firefox NightlyDebug.app/Contents/MacOS/firefox
[task 2020-07-19T23:55:49.722Z] 23:55:49 INFO - fix-stacks note: this is expected and harmless for system libraries on debug automation runs
[task 2020-07-19T23:55:49.722Z] 23:55:49 INFO - GECKO(1159) | #31: main [/Users/cltbld/tasks/task_1595202714/build/application/Firefox NightlyDebug.app/Contents/MacOS/firefox + 0x1624]
[task 2020-07-19T23:55:49.722Z] 23:55:49 INFO - TEST-INFO | started process screencapture
[task 2020-07-19T23:55:49.862Z] 23:55:49 INFO - TEST-INFO | screencapture: exit 0

Flags: needinfo?(twisniewski)

Do you know why the crash in ~Localization is happening? See comment 5 for the formatted crash.

The crash happened at the first attempt to land, did not happen when I retriggered, and happened again on the second attempt to land the patch.

Flags: needinfo?(gandalf)

I don't see a reason for that, but it's originated in mozilla::DropJSObjects(this); which we recently added in bug 1631593. it may be that the recent changes to shutdown are flipping some shutdown race.

Olli was reviewing the patch that added it, so maybe he'll have an idea how DropJSObjects gets called so late (should it just do nothing at this stage of the shutdown?)

Flags: needinfo?(gandalf) → needinfo?(bugs)

Why would it be originated to mozilla::DropJSObjects(this);?
The crash happens when we implicitly release Localization::mLocalization at the end of the destructor.

Flags: needinfo?(bugs)

There was a similar crash in Thunderbird, in bug 1629495, which got "fixed" by disabling some tests.

Both browser/components/shell/test/test_headless_screenshot.html and the failing Thunderbird tests ran without e10s.
Would you be fine with disabling the test, relanding the patch (that has nothing to do with Localization) and debugging this in a follow-up bug?
Or do you think that we can keep it disabled because non-e10s is not really supported any more?

Flags: needinfo?(gandalf)

Ah, so [task 2020-07-19T23:55:49.709Z] 23:55:49 INFO - GECKO(1159) | #04: mozilla::intl::Localization::~Localization() [intl/l10n/Localization.cpp:146] is not https://searchfox.org/mozilla-central/source/intl/l10n/Localization.cpp#145

It's the implicit release of mLocalization which happens after.

Would it help to release it in Destroy which would then be covered by NS_IMPL_CYCLE_COLLECTION_UNLINK_BEGIN?

Flags: needinfo?(gandalf)

I don't think that will make a difference, as you're already running NS_IMPL_CYCLE_COLLECTION_UNLINK(mLocalization) there, which will null out mLocalization if the type is ever unlinked: https://searchfox.org/mozilla-central/rev/3b6958c26049c1e27b2790a43154caaba9f6dd4a/intl/l10n/Localization.cpp#24

A possible patch for the crash (just based on code inspection by Nika and me)
https://treeherder.mozilla.org/#/jobs?repo=try&revision=bf8158b6fbdd960e355dd159aa50d5fa9e6835cc

oops, I had something unrelated on that tree too. Oh well, that shouldn't affect to shutdown.

Rob, if you know how to reproduce the crash, even if just on tryserver, could you try this patch https://hg.mozilla.org/try/rev/8abbccf0d0572087f0daf1d58eb06a695d03459b

Flags: needinfo?(rob)

Two attempts to landing it failed, but a retrigger succeeded, so I don't think that I am able to consistently reproduce.

If your patch doesn't cause issues (i.e. your try push with the patch in isolation was green), then I suggest to send your patch to Phabricator, let the patch stack here depend on your patch, and then land all patches together. If your patch fixes the issue, the patchset will stick. If it doesn't fix the issue, then this patch may eventually be backed out.

Flags: needinfo?(rob)

Does any of that need to happen before the merge? I'd rather land the patch after the merge.

I think that :twisniewski would prefer to land this before the branch, but since it's a huge patch and we're investigating an unknown issue, I would recommend to wait after the branch instead. Then we have four more weeks to comfortably investigate issues in case things go south.

Agreed. I would have preferred to land this before the branch, but if there is risk involved with this fix, then let's wait.

Flags: needinfo?(twisniewski)

Given that the patch in bug 1655778 has successfully landed, I'm going to try landing this patch again to confirm whether it fixes the test-failure (with the trivial rebasing required since I tried landing last time).

Pushed by twisniewski@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/a7d72e255375 add support for shimming resources blocking by tracking protection to webcompat built-in addon; r=dimi,robwu,webcompat-reviewers,denschub,miketaylr
Regressions: 1656863
Depends on: 1657930
Regressions: 1661330
Flags: needinfo?(jwatt)
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: