Closed Bug 1633198 Opened 5 years ago Closed 4 years ago

Port bug 1632098 - Enable ParentProcessDocumentChannel

Categories

(Thunderbird :: Upstream Synchronization, defect, P1)

Tracking

(thunderbird_esr78 wontfix)

RESOLVED FIXED
81 Branch
Tracking Status
thunderbird_esr78 --- wontfix

People

(Reporter: intermittent-bug-filer, Assigned: mkmelin)

References

Details

(Keywords: leave-open)

Attachments

(1 file)

Filed by: mkmelin+mozilla [at] iki.fi
Parsed log: https://treeherder.mozilla.org/logviewer.html#?job_id=299406854&repo=comm-central
Full log: https://firefox-ci-tc.services.mozilla.com/api/queue/v1/task/Cub7hhsTQByqYzNG3r9Lww/runs/0/artifacts/public/logs/live_backing.log


Many similar failures. comm/mail/test/browser/message-header/browser_phishingBar.js is another one

Severity: normal → critical
Priority: P5 → P1
Attached patch bug1633198_ppi.patch (deleted) — Splinter Review

Going to land as bustage fix.

Assignee: nobody → mkmelin+mozilla
Status: NEW → ASSIGNED
Attachment #9143401 - Flags: review?(geoff)

Pushed by mkmelin@iki.fi:
https://hg.mozilla.org/comm-central/rev/353338823ec9
set browser.tabs.documentchannel.ppdc false (tests opening new windows broken without it). rs=bustage-fix

Status: ASSIGNED → RESOLVED
Closed: 5 years ago
Resolution: --- → FIXED

This fixes also the Feed message shown as web page. With the pref set to true it shows only garbage.

Target Milestone: --- → Thunderbird 77.0

This is acceptable in the short term, although the breaking changes have been switched off again for now, but we're going to have to fix it properly.

Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Summary: comm/calendar/test/browser/invitations/browser_imipBarEmail.js (etc.) | Uncaught exception - at resource://testing-common/mozmill/utils.jsm:386 - TimeoutError: Timed out waiting for window open! → Port bug 1632098 - Enable ParentProcessDocumentChannel
Attachment #9143401 - Flags: review?(geoff)

I've managed to fix most of the tests with the pref turned on, through a variety of hacks, but mostly by taking them off our ancient system for waiting for windows to open. See https://treeherder.mozilla.org/#/jobs?repo=try-comm-central&revision=95e20c341344fc3a41239bca7f7e3eefff406835.

Some of the tests are crashing persistently, at mozilla::net::ParentProcessDocumentChannel::DisconnectChildListeners(nsresult, nsresult).

It seems that the event loops of new windows completely stall unless I call setTimeout on them. No matter how long you wait, nothing happens. This could be something to do with the remains of Mozmill hanging around, but I don't think it is. Some of the timeouts I added aren't necessary with the pref set to false.

(In reply to Geoff Lankow (:darktrojan) from comment #7)

I've managed to fix most of the tests with the pref turned on, through a variety of hacks, but mostly by taking them off our ancient system for waiting for windows to open. See https://treeherder.mozilla.org/#/jobs?repo=try-comm-central&revision=95e20c341344fc3a41239bca7f7e3eefff406835.

Some of the tests are crashing persistently, at mozilla::net::ParentProcessDocumentChannel::DisconnectChildListeners(nsresult, nsresult).

A regression was introduced over the weekend that broke the Parent Process DocumentChannel, I pushed the fixes earlier today.

Sorry for that. The pref being disabled by default means that it's not tested.

I pushed the code in order to avoid having to rebase constantly which on this heavily worked on code takes a considerable amount of time.

It seems that the event loops of new windows completely stall unless I call setTimeout on them. No matter how long you wait, nothing happens. This could be something to do with the remains of Mozmill hanging around, but I don't think it is. Some of the timeouts I added aren't necessary with the pref set to false.

Do you have some examples of tests you can link to? Happy to have a look at them.
I've fixed LOTS of old tests so far.

Unfortunately, non-e10s config isn't something we test much these days, the only code we have that runs in non-e10s are the accessibility tests. I certainly paid no time making sure it still worked.
We may have to disable DC if non-e10s. It doesn't bring much value without e10s

I'm hoping PPDC will be enabled by default tomorrow (I've pushed a pref change)

Assignee: mkmelin+mozilla → geoff

(In reply to Jean-Yves Avenard [:jya] from comment #8)

We may have to disable DC if non-e10s. It doesn't bring much value without e10s

This is probably the easiest thing to do, either by keeping the pref or hard-wiring it.

ISTR you told me the pref would be around until at least the ESR. If that's the case, from our point of view it would be best to call this bug fixed and deal with it again, if we have to, after shipping 78.

Flags: needinfo?(jyavenard)

(In reply to Geoff Lankow (:darktrojan) from comment #10)

(In reply to Jean-Yves Avenard [:jya] from comment #8)

We may have to disable DC if non-e10s. It doesn't bring much value without e10s

This is probably the easiest thing to do, either by keeping the pref or hard-wiring it.

ISTR you told me the pref would be around until at least the ESR. If that's the case, from our point of view it would be best to call this bug fixed and deal with it again, if we have to, after shipping 78.

I'd like to point out that DC only expose more obviously existing event handling issues.
Intermittent failures that would be made to permanently failing.

Flags: needinfo?(jyavenard)

These are current, open bugs with a Severity of critical. The Severity of these bugs is being changed to S2 to be consistent with the May 4 2020 Severity definitions.

Please let Release Management know if these bugs are still S2.

Severity: critical → S2
Priority: P1 → P3
Pushed by geoff@darktrojan.net: https://hg.mozilla.org/comm-central/rev/730d8d1b75c0 Modify tests to handle removal of ParentProcessDocumentChannel pref in bug 1654922. rs=bustage-fix

This isn't responsible for any of the other failures in the tree, so I'm calling it done.

Status: REOPENED → RESOLVED
Closed: 5 years ago4 years ago
Keywords: leave-open
Resolution: --- → FIXED
Target Milestone: Thunderbird 77.0 → 81 Branch

Okay, it is caused by bug 1654922, even though I did a Try run with that bug backed out and it still failed.

Status: RESOLVED → REOPENED
Resolution: FIXED → ---

So what do we do? This change here is breaking a number of things in Thunderbird. I suspect that's because we're non-e10s, but TBH I don't know enough about these things to be sure.

Flags: needinfo?(matt.woodrow)
Flags: needinfo?(jyavenard)
Flags: needinfo?(agakhokidze)

I'm going to wait for :jya or :mattwoodrow to pitch in, but a week before I landed my patches for removing DC prefs, I filed bug 1655616 - "Investigate breakage caused by removing browser.tabs.documentchannel related prefs."

Flags: needinfo?(agakhokidze)

What are the remaining issues left broken in Thunderbird?

We're moving towards only supporting DocumentChannel style navigations, and significantly refactoring the code around that, so supporting disabling the pref isn't a workable long term option unfortunately.

Flags: needinfo?(matt.woodrow)

(In reply to Anny Gakhokidze [:annyG] from comment #18)

I'm going to wait for :jya or :mattwoodrow to pitch in, but a week before I landed my patches for removing DC prefs, I filed bug 1655616 - "Investigate breakage caused by removing browser.tabs.documentchannel related prefs."

You did, and I apologise for nobody getting to it earlier. The last few weeks have been very busy for all of us.

(In reply to Matt Woodrow (:mattwoodrow) from comment #19)

What are the remaining issues left broken in Thunderbird?

We're moving towards only supporting DocumentChannel style navigations, and significantly refactoring the code around that, so supporting disabling the pref isn't a workable long term option unfortunately.

Best I can really do is point you to the tests we have that are now failing and tell you what I know about them. There's a lot of old code here that I don't understand, I'm just a guy trying to keep our tree green.

I don't believe I can be of much help here.

When I last look at it; some of the tests was due to the incorrect assumptions that when you open a tab/window you can immediately read its content within the same event loop; when it should be waiting for the "load" event to be fired.

Like this code here looks suspicious to me:
https://hg.mozilla.org/comm-central/file/tip/mail/test/browser/message-header/browser_phishingBar.js#l251

Or in this code, there's a proper wait here https://searchfox.org/comm-central/source/mail/test/browser/content-policy/browser_jsContentPolicy.js#258-266

but not for this one:
https://searchfox.org/comm-central/source/mail/test/browser/content-policy/browser_jsContentPolicy.js#143-167

this looks suspicious. Mind you, this is just glancing at the code and I can't be certain that this is correct.

However, this is where I would start looking.

Flags: needinfo?(jyavenard)
Pushed by geoff@darktrojan.net: https://hg.mozilla.org/comm-central/rev/af2c561cebeb Handle null request context in nsMsgContentPolicy::ShouldLoad. rs=bustage-fix

The push above doesn't solve all of the problems, but it does help.

Pushed by geoff@darktrojan.net: https://hg.mozilla.org/comm-central/rev/8dd394c15e6b Return NS_OK instead of NS_ERROR_INVALID_POINTER when no context is available. rs=bustage-fix
Assignee: geoff → nobody

This is identified as a regression to bug 1657493 - every news article crashes.

Is there a super quick thing we can do to fix beta?

Flags: needinfo?(mkmelin+mozilla)
Flags: needinfo?(benc)

I sent a band-aid fix to bug 1657493.

Flags: needinfo?(mkmelin+mozilla)
Flags: needinfo?(benc)

(In reply to Wayne Mery (:wsmwk) from comment #25)

This is identified as a regression to bug 1657493 - every news article crashes.

Not every news article. Only new ones.
If "Select Newsgroup for offline use" is activated, articles from the local datastore can be shown as normal.

It it important for this bug (or some other) to be resolved because of the news breakage.

And I miscalculated in not taking the band-aid patch from bug 1657493 for 81.0b2 because I thought every article crashed. Will take it in the next beta - but I hope the larger issue will already be fixed by then. Because most users are not going to realize that they can do "Select Newsgroup for offline use". Further, most users likely won't care because they are going to news to get new messages, not the old ones.

Blocks: 1661955
Severity: S2 → S1
Flags: needinfo?(mkmelin+mozilla)
Priority: P3 → P1
Component: General → Upstream Synchronization

I think we did all we needed here. Certainly there's always room for test improvements, but nothing really actionable in this bug.

Status: REOPENED → RESOLVED
Closed: 4 years ago4 years ago
Flags: needinfo?(mkmelin+mozilla)
Resolution: --- → FIXED
Assignee: nobody → mkmelin+mozilla
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: