Closed
Bug 1459655
Opened 7 years ago
Closed 7 years ago
Crash in mozilla::ipc::PrincipalInfoToPrincipal
Categories
(Core :: DOM: Core & HTML, defect, P1)
Core
DOM: Core & HTML
Tracking
()
VERIFIED
FIXED
mozilla62
Tracking | Status | |
---|---|---|
firefox-esr52 | --- | unaffected |
firefox-esr60 | --- | unaffected |
firefox60 | --- | unaffected |
firefox61 | blocking | verified |
firefox62 | + | verified |
People
(Reporter: marcia, Assigned: Mardak)
References
Details
(Keywords: crash, regression)
Crash Data
User Story
Attachments
(2 files)
(deleted),
patch
|
baku
:
review+
RyanVM
:
approval-mozilla-beta+
|
Details | Diff | Splinter Review |
(deleted),
text/x-review-board-request
|
Gijs
:
review+
k88hudson
:
review+
RyanVM
:
approval-mozilla-beta+
|
Details |
This bug was filed from the Socorro interface and is
report bp-0519cdd9-340c-41e4-8b54-f32000180506.
=============================================================
Seen while looking at nightly crash stats, crashes started on Windows and Mac using 20180504220115: https://bit.ly/2IhHmlz (Windows). Mac crashes: https://bit.ly/2jAoT5A. Moz Crash reason: MOZ_CRASH(Origin must be available when deserialized)
Possible regression range based on Build ID: https://hg.mozilla.org/mozilla-central/pushloghtml?fromchange=a2d1d4158bb4718d8bb31e1284e133aa99273600&tochange=5207b1392b11db534550a5eb801302e6dbb58f95
Top 10 frames of crashing thread:
0 xul.dll mozilla::ipc::PrincipalInfoToPrincipal ipc/glue/BackgroundUtils.cpp:106
1 xul.dll mozilla::ipc::LoadInfoArgsToLoadInfo ipc/glue/BackgroundUtils.cpp:464
2 xul.dll mozilla::net::HttpChannelParent::DoAsyncOpen netwerk/protocol/http/HttpChannelParent.cpp:521
3 xul.dll mozilla::net::HttpChannelParent::Init netwerk/protocol/http/HttpChannelParent.cpp:131
4 xul.dll mozilla::net::NeckoParent::RecvPHttpChannelConstructor netwerk/ipc/NeckoParent.cpp:323
5 xul.dll mozilla::net::PNeckoParent::OnMessageReceived ipc/ipdl/PNeckoParent.cpp:920
6 xul.dll mozilla::dom::PContentParent::OnMessageReceived ipc/ipdl/PContentParent.cpp:3497
7 xul.dll mozilla::ipc::MessageChannel::DispatchAsyncMessage ipc/glue/MessageChannel.cpp:2141
8 xul.dll mozilla::ipc::MessageChannel::DispatchMessageW ipc/glue/MessageChannel.cpp:2071
9 xul.dll mozilla::ipc::MessageChannel::RunMessage ipc/glue/MessageChannel.cpp:1917
=============================================================
Reporter | ||
Updated•7 years ago
|
OS: Windows 10 → All
Hardware: x86 → All
Comment 1•7 years ago
|
||
Christoph, any ideas? Nothing in the files in the top few stack frames seems to have changed on 5-4.
Flags: needinfo?(ckerschb)
Priority: -- → P2
Updated•7 years ago
|
status-firefox62:
--- → affected
status-firefox-esr52:
--- → unaffected
status-firefox-esr60:
--- → unaffected
Comment 2•7 years ago
|
||
(In reply to Andrew McCreight [:mccr8] from comment #1)
> Christoph, any ideas? Nothing in the files in the top few stack frames seems
> to have changed on 5-4.
I browsed through PrincipalInfoToPrincipal with particular focus around line 106, but nothing obvious seems wrong here to me. Would be curious to see what's going on. I suppose we don't have STR handy, right? Otherwise I would get right to it.
Flags: needinfo?(ckerschb)
Comment 3•7 years ago
|
||
Not sure it explains this, but bug 1456986 is in the range and I just backed it out. Maybe that will help here.
Comment 4•7 years ago
|
||
Looks like this is still crashing on Nightlies with that backout included.
Comment 5•7 years ago
|
||
I've look at the push list in comment 2 and I don't really see anything that explains this. If the optional principal value was garbage on the sending side I would expect it to crash in the child process.
Maybe we should add a similar MOZ_CRASH() on missing origin in the PrincipalToPrincipalInfo() serialization side of things so we can get crash reports closer to the place where the bad value is being produced.
Andrea, what do you think?
Flags: needinfo?(amarchesini)
Comment 6•7 years ago
|
||
(In reply to Ben Kelly [:bkelly] from comment #5)
> Maybe we should add a similar MOZ_CRASH() on missing origin in the
> PrincipalToPrincipalInfo() serialization side of things so we can get crash
> reports closer to the place where the bad value is being produced.
FWIW, that sounds like a good path forward to me - thanks.
Comment 7•7 years ago
|
||
> Andrea, what do you think?
Yes, this seems a excellent starting point. I would also convert MOZ_ASSERT to MOZ_DIAGNOSTIC_ASSERT(mInitialized) in BasePrincipal::GetOrigin and in BasePrincipal::GetOriginNoSuffix
Flags: needinfo?(amarchesini)
Comment 8•7 years ago
|
||
This is spiking big-time on 61.0b3. If we can get a diagnostic patch created quickly, we can uplift to Beta as well ASAP.
Comment 9•7 years ago
|
||
I'll work on creating a diagnostic patch this morning. Not sure if I should do it in a dependent bug or here, though.
Comment 10•7 years ago
|
||
the crash signature is accounting for 70% of browser crashes in 61.0b3.
this would be a more narrow pushlog than in comment #0 from build 20180504100129 to build 20180504220115:
https://hg.mozilla.org/mozilla-central/pushloghtml?fromchange=8994f3&tochange=5207b1
Comment 11•7 years ago
|
||
Actually, I see an old bug that could explain why this is triggering a crash instead of failing further upstream. I'll fix that.
Comment 12•7 years ago
|
||
The return value of the triggering principal PrincipalToPrincipalInfo() is not checked here:
https://searchfox.org/mozilla-central/rev/eb6c5214a63e20a3fff455e92c876287a8d2e188/ipc/glue/BackgroundUtils.cpp#316
Updated•7 years ago
|
Assignee: nobody → bkelly
Status: NEW → ASSIGNED
Updated•7 years ago
|
Priority: P2 → P1
Comment 13•7 years ago
|
||
We should, like, check for error and stuff.
Attachment #8974701 -
Flags: review?(amarchesini)
Comment 14•7 years ago
|
||
Comment 15•7 years ago
|
||
The missed error checking was introduced in bug 1179505. Not sure what changed recently to make the principal fail to serialize, though.
Blocks: 1179505
Updated•7 years ago
|
Attachment #8974701 -
Flags: review?(amarchesini) → review+
Comment 16•7 years ago
|
||
I'm going to wait for try in case this causes the error condition to show as an orange somehow. Although thats probably unlikely given we don't see the crash in automation.
Comment 17•7 years ago
|
||
Pushed by bkelly@mozilla.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/0e0e6097f282
Check for error when serializing the LoadInfo triggering principal. r=baku
Comment 18•7 years ago
|
||
Comment on attachment 8974701 [details] [diff] [review]
Check for error when serializing the LoadInfo triggering principal. r=baku
Approval Request Comment
[Feature/Bug causing the regression]: Partially bug 1179505 for removing some error checking, but also some newer regression which causes PrincipalToPrincipalInfo() return an error code. (I think).
[User impact if declined]: Not catching the error causes us to hit a MOZ_CRASH() on the parent side. This is a top crasher in beta.
[Is this code covered by automated tests?]: This problem does not trigger in automation.
[Has the fix been verified in Nightly?]: Not yet. I only pushed this to inbound just now. Since this is a top crash, though, we want to get this potential fix in the next beta if possible.
[Needs manual test from QE? If yes, steps to reproduce]: No
[List of other uplifts needed for the feature/fix]: None
[Is the change risky?]: Minimal
[Why is the change risky/not risky?]: One line change to check for a failed error code.
[String changes made/needed]: None
Attachment #8974701 -
Flags: approval-mozilla-beta?
Comment 19•7 years ago
|
||
Comment on attachment 8974701 [details] [diff] [review]
Check for error when serializing the LoadInfo triggering principal. r=baku
Given the frequency of this crash, let's take the speculative fix now for today's 61.0b4 build.
Attachment #8974701 -
Flags: approval-mozilla-beta? → approval-mozilla-beta+
Comment 20•7 years ago
|
||
bugherder uplift |
Comment 21•7 years ago
|
||
bugherder |
Status: ASSIGNED → RESOLVED
Closed: 7 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla62
Comment 22•7 years ago
|
||
We've still got crashes with the latest nightly including this change, e.g. https://crash-stats.mozilla.com/report/index/c3034268-869c-451f-a0c9-f3a770180511
Updated•7 years ago
|
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Target Milestone: mozilla62 → ---
Comment 23•7 years ago
|
||
So we're crashing in this code:
nsAutoCString originNoSuffix;
rv = principal->GetOriginNoSuffix(originNoSuffix);
if (NS_WARN_IF(NS_FAILED(rv)) ||
!info.originNoSuffix().Equals(originNoSuffix)) {
MOZ_CRASH("Origin must be available when deserialized");
}
It seems the principal is serializing fine, but its failing this conditional:
info.originNoSuffix().Equals(originNoSuffix)
So maybe its not that the origin is missing, but that it does not match the origin contained in the spec string.
Comment 24•7 years ago
|
||
I'll make a diagnostic patch to assert that the origin and spec match.
Comment 25•7 years ago
|
||
I wonder if its something like the principal URL is "about:blank" and the origin string is something like "http://example.com".
Comment 26•7 years ago
|
||
Ted, is there any chance you have time to grab a mini dump from one of these crashes and look at the strings being compared just before the crash? I don't have access to the dumps and I'm also not familiar with the tools to inspect them after the fact.
Flags: needinfo?(ted)
Comment 27•7 years ago
|
||
i can reproduce this crash with a particular profile in a portable firefox version & *think* this was introdcued by bug 1438367.
Blocks: 1438367
Flags: needinfo?(edilee)
Comment 28•7 years ago
|
||
This might fallout from removing nsIAboutModule::MAKE_LINKABLE. It seems that impact how the nsIURI is formed:
https://searchfox.org/mozilla-central/rev/3f17a234769d25fca5144ebb8abc8e1cb3c56c16/netwerk/protocol/about/nsAboutProtocolHandler.cpp#44
https://searchfox.org/mozilla-central/rev/3f17a234769d25fca5144ebb8abc8e1cb3c56c16/netwerk/protocol/about/nsAboutProtocolHandler.cpp#139-161
And then the principal code is expecting certain behavior for the about:blank URI:
https://searchfox.org/mozilla-central/rev/3f17a234769d25fca5144ebb8abc8e1cb3c56c16/caps/ContentPrincipal.cpp#107-113
But that is a DEBUG-only assert, so we might be blowing right past that and hitting the MOZ_CRASH instead.
Dropping NI on Ted since we have a reproducing profile.
Flags: needinfo?(ted)
Comment 29•7 years ago
|
||
(In reply to Ben Kelly [:bkelly] from comment #25)
> I wonder if its something like the principal URL is "about:blank" and the
> origin string is something like "http://example.com".
Riffing on this given comment #27 and complete lack of STR and so on (so can't really see exactly what's going on), I wonder if this is about about:home vs. moz-safe-about:home or something?
Comment 30•7 years ago
|
||
FWIW, if this is broken on about:home today I don't see why it wouldn't also be broken on about:newtab even if we back out the patch for bug 1438367, given that about:newtab essentially loads exactly the same stuff and was already using the same flags and thus type of nested (or not) URI.
Comment 31•7 years ago
|
||
I can confirm that this is fixed by backing out bug 1438367. The following patch adding back MAKE_LINKABLE also fixes it.
We should probably backout for now and re-land with a test that would catch this, though. We clearly don't have adequate test coverage for this case. Not sure what the trigger is, but this profile has two blank "highlight" panels.
--- a/browser/components/about/AboutRedirector.cpp
+++ b/browser/components/about/AboutRedirector.cpp
@@ -79,17 +79,17 @@ static const RedirEntry kRedirMap[] = {
nsIAboutModule::HIDE_FROM_ABOUTABOUT },
{ "sessionrestore", "chrome://browser/content/aboutSessionRestore.xhtml",
nsIAboutModule::ALLOW_SCRIPT |
nsIAboutModule::HIDE_FROM_ABOUTABOUT },
{ "welcomeback", "chrome://browser/content/aboutWelcomeBack.xhtml",
nsIAboutModule::ALLOW_SCRIPT |
nsIAboutModule::HIDE_FROM_ABOUTABOUT },
// Actual activity stream URL for home and newtab are set in channel creation
- { "home", "about:blank", ACTIVITY_STREAM_FLAGS },
+ { "home", "about:blank", ACTIVITY_STREAM_FLAGS | nsIAboutModule::MAKE_LINKABLE },
{ "newtab", "about:blank", ACTIVITY_STREAM_FLAGS },
{ "library", "chrome://browser/content/aboutLibrary.xhtml",
nsIAboutModule::URI_MUST_LOAD_IN_CHILD |
nsIAboutModule::URI_SAFE_FOR_UNTRUSTED_CONTENT },
{ "preferences", "chrome://browser/content/preferences/in-content/preferences.xul",
nsIAboutModule::ALLOW_SCRIPT },
{ "downloads", "chrome://browser/content/downloads/contentAreaDownloadsView.xul",
nsIAboutModule::ALLOW_SCRIPT },
Assignee: bkelly → edilee
Assignee | ||
Comment 32•7 years ago
|
||
There were some changes that were dependent on MAKE_LINKABLE going away, so if we revert bug 1438367, I'll also need to revert some other things too at the same time. I'm pretty sure we haven't had any other changes to beta, so it should uplift cleanly although it will touch ~100 files.
Flags: needinfo?(edilee)
Assignee | ||
Updated•7 years ago
|
Comment hidden (mozreview-request) |
Comment 34•7 years ago
|
||
mozreview-review |
Comment on attachment 8975121 [details]
Bug 1459655 - Restore about flag for about:home.
https://reviewboard.mozilla.org/r/243490/#review249348
Looks good to me
Attachment #8975121 -
Flags: review?(khudson) → review+
Comment 35•7 years ago
|
||
mozreview-review |
Comment on attachment 8975121 [details]
Bug 1459655 - Restore about flag for about:home.
https://reviewboard.mozilla.org/r/243490/#review249386
grumbly rs=me - I'd still like to understand how this is causing crashes, and I still don't see STR in the bug... anyway, I guess that's something for a follow-up.
Also, I'm not sure why all the JS needs wrapping in an IIFE / self-executing function expression. But I mean, it'll work for the purposes of what we need it to do...
Attachment #8975121 -
Flags: review?(gijskruitbosch+bugs) → review+
Comment 36•7 years ago
|
||
The STR are to open a browser using Philip's profile. Maybe he can send it to you. (Or I can when I'm back to my computer.)
Comment 37•7 years ago
|
||
Pushed by edilee@gmail.com:
https://hg.mozilla.org/integration/autoland/rev/ad8535a8e153
Restore about flag for about:home. r=k88hudson
Assignee | ||
Comment 38•7 years ago
|
||
Comment on attachment 8975121 [details]
Bug 1459655 - Restore about flag for about:home.
Approval Request Comment
[Feature/Bug causing the regression]: Bug 1438367 regressed
[User impact if declined]: Top crasher in beta
[Is this code covered by automated tests?]: No
[Has the fix been verified in Nightly?]: Still on autoland right now
[Needs manual test from QE? If yes, steps to reproduce]: No
[List of other uplifts needed for the feature/fix]: None
[Is the change risky?]: Shouldn't be
[Why is the change risky/not risky?]: This is reverting two changes (the regressing bug and a dependent bug) resulting in a state that was previously good on Nightly.
[String changes made/needed]: None
Attachment #8975121 -
Flags: approval-mozilla-beta?
Comment 39•7 years ago
|
||
bugherder |
Status: REOPENED → RESOLVED
Closed: 7 years ago → 7 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla62
Comment 40•7 years ago
|
||
Comment on attachment 8975121 [details]
Bug 1459655 - Restore about flag for about:home.
It would appear that we're still seeing crashes with this signature on Nightly builds containing the fix, but possibly at lower rates than before. It's really hard to say, though, given that it's the weekend and usage might just be lower.
Flags: needinfo?(edilee)
Comment 41•7 years ago
|
||
Comment on attachment 8975121 [details]
Bug 1459655 - Restore about flag for about:home.
I'm approving this for 61.0b5 because it does seem to have made a difference on Nightly, but I'm not entirely convinced we're out of the woods here yet.
Attachment #8975121 -
Flags: approval-mozilla-beta? → approval-mozilla-beta+
Comment 42•7 years ago
|
||
bugherder uplift |
Comment 43•7 years ago
|
||
OK, so the reason this crashes is that we start an http request where the triggering principal is about:home, and the triggering principal info that HttpChannelParent gets has originNoSuffix of moz-safe-about:home , but uri of about:home, which means the codebase principal that gets constructed gets originNoSuffix of about:home, which mismatches and therefore blows up.
Doing a JS stack dump in the content process for this load, it looks like it's triggered off a restoreTabContent call (see ContentRestore.jsm and content-sessionStore.js) . This makes me believe that what's happening is the following:
0. set session to autorestore
1. with Firefox version X, where about:home is non-nested, the user opens about:home and clicks a link that goes to foo.com
2. get an update ready to version X+1 where about:home *is* nested.
3. restart to update
4. crash, because about:home and moz-safe-about:home are not the same.
Notably, this means that now that we've switched back, people will still hit this crash but for the inverse process, where they opened a link from about:home and the meaning of the about:home origin changed again, so there's a mismatch again, so it's crashy again, which explains comment #40 (obviously there's fewer people with a session with the "new" version of about:home's principal than there are people with the "old" version of about:home's principal).
Comment 44•7 years ago
|
||
Would it make sense to just special case that specific about:home case for some time?
Comment 45•7 years ago
|
||
(In reply to Christoph Kerschbaumer [:ckerschb] from comment #44)
> Would it make sense to just special case that specific about:home case for
> some time?
Maybe, but it'd be nice not to get caught out like this again. Further discussion is in bug 1461407.
Comment 46•7 years ago
|
||
I think we have enough crash data to call this verified at this point. Note that there's still an open question about uplifting bug 1461407 to Beta, but the crash rates are definitely tailing off now and we shouldn't see this on release at all after 61 ships.
Status: RESOLVED → VERIFIED
Flags: needinfo?(edilee)
Updated•6 years ago
|
Component: DOM → DOM: Core & HTML
You need to log in
before you can comment on or make changes to this bug.
Description
•