News messages aren't fetched
Categories
(MailNews Core :: Networking: NNTP, defect, P1)
Tracking
(thunderbird_esr78 unaffected, thunderbird81 affected)
Tracking | Status | |
---|---|---|
thunderbird_esr78 | --- | unaffected |
thunderbird81 | --- | affected |
People
(Reporter: Paenglab, Assigned: benc)
References
Details
(Keywords: regression)
Attachments
(5 files, 4 obsolete files)
(deleted),
text/plain
|
Details | |
(deleted),
patch
|
Details | Diff | Splinter Review | |
(deleted),
patch
|
mkmelin
:
review+
|
Details | Diff | Splinter Review |
(deleted),
patch
|
benc
:
review+
|
Details | Diff | Splinter Review |
(deleted),
patch
|
benc
:
review+
|
Details | Diff | Splinter Review |
Bug 1657493 fixed the crash when accessing news messages but since then the messages aren't downloaded in 99% of the times. News messages that are already downloaded before (locally available) are still readable. Before the fix of bug 1657493 there where already some messages that where still readable and didn't crash. So probably the same message are the ones that are still fetch-able.
This is a blocker as newsgroups, one of TB's base features, aren't usable.
Comment 1•4 years ago
|
||
This is a 100% regression of bug 1657493 attachment 9172251 [details] [diff] [review] because in fact it simply disables the news fetch. No fetch -> no crash.
That was a good idea to prevent crashes in the beta build. But now it's not helpful because we still need to fix the crash.
So in my humble opinion we should revert attachment 9172251 [details] [diff] [review], reopen bug 1657493 and then hunt the crash.
Updated•4 years ago
|
Updated•4 years ago
|
Assignee | ||
Updated•4 years ago
|
Comment 2•4 years ago
|
||
Comment 3•4 years ago
|
||
I have attached a backtrace just before it actually crashes. The problem seems to be that DocumentLoadListener::RedirectToRealChannel is called indirectly from within DocumentLoadListener::OpenDocument, and attempts to resolve the mOpenPromise promise. However, DocumentLoadListener::OpenDocument only creates mOpenPromise at the very end, so within RedirectToRealChannel, mOpenPromise is still a null pointer, and resolving that is not going to go well. I'm not familiar enough with how this is intended to work to say whether this is a problem in DocumentLoadListener or in nsNNTPProtocol, but it should be possible to work around in DocumentLoadListener by creating mOpenPromise on first use (either in OpenDocument or in RedirectToRealChannel).
Comment 4•4 years ago
|
||
That really did turn out to be easy to work around: news messages work fine for me locally with the attached patch, tested on top of Thunderbird 81.0b2. Like I said, I have no idea whether this is actually the right way to fix things. If it is not, I hope it will at least help in getting a proper fix.
Assignee | ||
Comment 5•4 years ago
|
||
Thanks Harald - your diagnosis of the crash tallies with what I've been looking at (ie DocumentLoadListener::mOpenPromise being null being the crash point).
Unfortunately, the patch doesn't work for me (freshly subscribed to a big newsgroup, all messages unread). I can read the first message I click on, but subsequent messages crash.
I'm pretty sure it's something being done wrong in nsNNTPProtocol
rather than on the M-C side.
I'm still slowly picking my way up through the callstack, figuring out what it's trying to do. It definitely does some weird things with the caching (during nsNNTPProtocol::AsyncOpen()
, it eventually ends up using itself to feed into the cache entry, as well as being the endpoint channel being used by the docshell, which seems a little odd).
If I bypass the caching, it works the same as with workaround.patch
- it works for the first message I click on, then crashes trying to read subsequent ones.
One thing I'm looking at now: It might just be because our nsNNTPProtocol::AsyncOpen()
invokes the listener's OnStartRequest()
directly. The nsIChannel documentation isn't totally clear, but I suspect AsyncOpen() is expected to return before OnStartRequest()
is invoked... that would certainly ensure DocumentLoadListener::mOpenPromise
was set up.
Comment 6•4 years ago
|
||
Debug build?
With a NoDebug build I can read News with Haralds patch on Trunk minus attachment 9172251 [details] [diff] [review].
Only the cursor doesn't stop spinning over folder , header and thread pane.
The debug build stops at this assertion:
https://dxr.mozilla.org/comm-central/source/netwerk/base/nsNetUtil.cpp#252
Assignee | ||
Comment 7•4 years ago
|
||
some notes to myself, mainly :-)
Quick overview: it fails trying to recycle existing nntp connections for new requests.
The crash occurs because of this line in nsNNTPProtocol::LoadUrl()
:
if (m_fromCache) nsMsgProtocol::OnStartRequest(nullptr);
(this is the line that was commented out in Bug 1657493).
LoadUrl()
is called (eventually) by AsyncOpen(), and the offending line means that the listener's OnStartRequest()
is called early, which none of the channel stuff can handle - there's an (implict?) assumption that listener callbacks (OnStartRequest()/OnStopRequest()/OnDataAvailable) aren't invoked until after AsyncOpen() returns, which seems like a reasonable condition to me.
So the line above violates that condition.
The m_fromCache
flag indicates that this is a cached connection (not cached data! This is really confusing, seeing how many cached-data calls are in the crashing callstack!). A cached connection is an already-open nsNNTPProtocol object, connected to the server, and m_fromCache
means we're reusing it.
When setting up a new request, it'll first see if any of the cached (ie already open) connections are idle. If not, and we're still below the "max_cached_connections" pref setting (default 2), it'll create a new nsNTTPProtocol
object and use that. As a last resort, it'll instead create a nsNNTPMockChannel
to hold all the request details and place it in a queue for when a real nsNTTPProtocol
object becomes available.
Also, whenever nsNTTPProtocol
objects finish their request, they submit themselves for reuse by any queued nsNNTPMockChannels
(via nsNntpIncomingServer::PrepareForNextUrl()
).
Cached nsNTTPProtocol
objects also have a timeout - if they're idle long enough, they're discarded.
I can get it all working again by hacking things to never recycle connections (I just hacked the timeout code to always trigger, and set my "max_cached_connections" to something huge).
Tomorrow I shall put together a stopgap patch with non-recycling connections to at least get it working again.
The if (m_fromCache) nsMsgProtocol::OnStartRequest(nullptr);
line appears to have been added to try and fix up some bad state when recycling nsNNTPProtocol objects. The associated comment says it's the loadGroup, but I'm not totally convinced by this. A little code archeology to do there.
Anyway, what I need to figure out now: what is wrong with the state of recycled nsNNTPProtocol
objects that causes them to stall? It'll turn out to be something stupidly simple, I'm sure : -)
Assignee | ||
Comment 8•4 years ago
|
||
Nasty little hack to disable all NNTP connection reuse. Slow and awful, obviously, but it works.
Not aiming to land it (hoping to work out a proper fix), but dumping it here just in case.
Assignee | ||
Comment 9•4 years ago
|
||
A few more notes:
The if (m_fromCache) nsMsgProtocol::OnStartRequest(nullptr);
line which causes all the trouble dates from waaaaay back. Changeset 0, the original import from CVS in 2008 or so. Which makes me less nervous about removing it and figuring out something cleaner.
The nntp connection reuse happens in two ways:
- in
nsNntpIncomingServer::GetNntpConnection()
- new requests check to see if any of the existing channels are idle (and haven't timed out). If so, it reuses one. - When
nsNNTPProtocol
finishes a request, it checks to see if thensNntpIncomingServer
has any queued requests (asnsNntpMockChannel
objects), and if so tells a mock channel to reuse it.
The two paths use slightly different mechanisms.
I suspect it'd be cleaner to unify them - always return an nsNntpMockChannel, and underneath bind them to nsNNTPProtocol objects as they become available (also some renaming to help clarify the roles: nsNntpMockChannel
-> nsNNTPChannel
and nsNNTPProtocol
-> nsNNTPConnection
, say).
Also, it looks like there's a logic bug in nsNntpIncomingServer::GetNntpConnection()
- if there's an idle connection but it's timed out, it doesn't go back to look for other idle connections. I could see a situation where new requests would get queued, but never actually acted upon...
Comment 10•4 years ago
|
||
Joshua, any insights?
Updated•4 years ago
|
Updated•4 years ago
|
Assignee | ||
Comment 11•4 years ago
|
||
Quick update - I've got a handle on the causes of this now and am working on a solution:
To serve up individual messages, the nsNNTPProtocol
(which handles real NNTP connections) is presented to the docshell as a nsIChannel
. However, because the nsNNTPProtocol
is a persistent connection (a sockettransport, managed by it's base nsMsgProtocol class), it doesn't appear (to the docshell) to follow the usual OnStartRequest/OnDataAvailable.../OnStopRequest patterns. Previously this was fudged by manually calling OnStartRequest inside our AsyncOpen(), to give the docshell a kick, but this was always a little dodgey. nsIChannels are meant to be single-shot things, handling a single request then going away. Not for representing persistent connections like we have here. We were lucky it worked for a long time, but now the fudged OnStartRequest-during-AsyncOpen() no longer works.
Got a couple of approaches to try, but I think the best one is to continue my current intention of using making sure the docshell always just deals through nsIMockChannel rather than sometimes using nsNNTPProtocol directly.
Assignee | ||
Comment 12•4 years ago
|
||
Here's the least-intrusive fix I can come up with.
It still leaves a whole heap of dodgy stuff in the NNTP code, but it Works For Me (tm) and I don't think it'll break anything else that's not already broken.
I'll craft a few followup patches to add some notes and tidy up a few loose ends I spotted, but unless there are any big issues, I'll leave the big intrusive nsNNTPProtocol/nsMockChannel cleanup for now. I think its best to consider that in terms of a larger cleanup of all the nsMsgProtocol-based classes.
Comment 13•4 years ago
|
||
Updated•4 years ago
|
Comment 14•4 years ago
|
||
Pushed by mkmelin@iki.fi:
https://hg.mozilla.org/comm-central/rev/81b009ff299a
Fix stalled message display when using persistent NNTP connections. r=mkmelin
Updated•4 years ago
|
Updated•4 years ago
|
Assignee | ||
Updated•4 years ago
|
Assignee | ||
Comment 15•4 years ago
|
||
Just a followup to add some comments on things that really confused me when I was digging about in NNTP code.
Assignee | ||
Comment 16•4 years ago
|
||
Cruft removal: nsINNTPProtocol.isCachedConnection
is no longer used.
Comment 17•4 years ago
|
||
Comment 18•4 years ago
|
||
Assignee | ||
Comment 19•4 years ago
|
||
Assignee | ||
Comment 20•4 years ago
|
||
Assignee | ||
Comment 21•4 years ago
|
||
(In reply to Magnus Melin [:mkmelin] from comment #18)
Good to get it documented, but not really ok per coding conventions in
xpcom. An ok should not really allow null out.
Technically it's not part of the XPCOM interface (It's just a helper function in nsNNTPConnection), but yeah, I agree.
For now I just wanted to keep that patch to comments, but that function would be one of the first to go in a proper cleanup of the NNTP code.
Assignee | ||
Updated•4 years ago
|
Comment 22•4 years ago
|
||
Pushed by geoff@darktrojan.net:
https://hg.mozilla.org/comm-central/rev/f81bd27cd9a9
Followup. Add some comments on NNTP implementation. r=mkmelin
https://hg.mozilla.org/comm-central/rev/2719fb20cee2
Followup. Remove now-unused nsINNTPProtocol.isCachedConnection attribute. r=mkmelin
Updated•4 years ago
|
Description
•