Closed
Bug 1399904
Opened 7 years ago
Closed 7 years ago
Crash in java.lang.IllegalStateException: Invalid account profile data at org.mozilla.gecko.fxa.authenticator.AndroidFxAccount.getAccountUID(AndroidFxAccount.java)
Categories
(Firefox for Android Graveyard :: Firefox Accounts, defect)
Tracking
(firefox-esr52 unaffected, firefox55 unaffected, firefox56 unaffected, firefox57 fixed)
RESOLVED
FIXED
Firefox 57
Tracking | Status | |
---|---|---|
firefox-esr52 | --- | unaffected |
firefox55 | --- | unaffected |
firefox56 | --- | unaffected |
firefox57 | --- | fixed |
People
(Reporter: calixte, Assigned: Grisha)
References
(Blocks 1 open bug)
Details
(Keywords: crash, regression, Whiteboard: [clouseau])
Crash Data
Attachments
(1 file)
This bug was filed from the Socorro interface and is
report bp-ff934e0b-a960-4e54-8777-a0a370170914.
=============================================================
There are 5 crashes in nightly 57 with buildid 20170914100116. In analyzing the backtrace, the regression may have been introduced by patch [1] to fix bug 1368147.
[1] https://hg.mozilla.org/mozilla-central/rev/d881fed7d5ca
Flags: needinfo?(gkruglov)
Comment 1•7 years ago
|
||
grisha: two thoughts:
- I wonder if this handles the self-hosting case, where many folks won't have a profile endpoint configured;
- I wonder if we lose the UID when we move the account into the Doghouse state (I hope not, that would be an oversight).
Assignee | ||
Updated•7 years ago
|
Assignee: nobody → gkruglov
Status: NEW → ASSIGNED
Flags: needinfo?(gkruglov)
Assignee | ||
Comment 2•7 years ago
|
||
It's crashing after failing to unpickle fxa.account.json file. Our prior behavior was to essentially pretend that there is no account present if this happened. Bug 1368147 now throws a runtime exception whenever we fail to parse the fxa.account.json file, which is uncaught and leads to these crashes. I think this just surfaced whatever problems we already had.
Failing to unpickle can happen for a few reasons:
- file's not there, or couldn't be read
- we failed to parse it as JSON
- we failed to unpickle the parsed JSON
-- this in turn may happen for a variety of reasons:
--- account type has changed
--- missing pickle version
--- missing 'bundle' key
--- invalid state information (missing both stateLabel & state). We _know_ this can happen - see Bug 1349147 for the same assertion triggering a crash
--- failures while re-constructing State persisted object
I don't yet see evidence that a UID may be actually missing once we successfully parse the pickled file; we have another strong assertion for that right after this assertion, and I don't (yet) see any crashes associated with it.
If I understand the self-hosting case correctly (admittedly, I only have a vague understanding of it other than "we have different endpoints configured"), then I doubt it's involved here. An operation, unchanged by Bug 1368147, is failing, and the difference is that before we'd pretend account wasn't there, and now we'd just crash. It's possible that self-hosted users would see strange behaviour before, which they'd perhaps resolve by re-logging in, and no bugs would get filed...
Comment 3•7 years ago
|
||
(In reply to :Grisha Kruglov from comment #2)
> It's crashing after failing to unpickle fxa.account.json file. Our prior
> behavior was to essentially pretend that there is no account present if this
> happened. Bug 1368147 now throws a runtime exception whenever we fail to
> parse the fxa.account.json file, which is uncaught and leads to these
> crashes. I think this just surfaced whatever problems we already had.
>
> Failing to unpickle can happen for a few reasons:
>
> - file's not there, or couldn't be read
This could happen -- are we certain this will only happen _after_ a pickle file is written?
> - we failed to parse it as JSON
I tried really hard to understand errors with JSON parsing. See https://bugzilla.mozilla.org/show_bug.cgi?id=964854, which I never did understand. We could help ourselves by not using org.json.simple.
> - we failed to unpickle the parsed JSON
> -- this in turn may happen for a variety of reasons:
> --- account type has changed
> --- missing pickle version
> --- missing 'bundle' key
> --- invalid state information (missing both stateLabel & state). We _know_
> this can happen - see Bug 1349147 for the same assertion triggering a crash
> --- failures while re-constructing State persisted object
>
> I don't yet see evidence that a UID may be actually missing once we
> successfully parse the pickled file; we have another strong assertion for
> that right after this assertion, and I don't (yet) see any crashes
> associated with it.
>
> If I understand the self-hosting case correctly (admittedly, I only have a
> vague understanding of it other than "we have different endpoints
> configured"), then I doubt it's involved here. An operation, unchanged by
> Bug 1368147, is failing, and the difference is that before we'd pretend
> account wasn't there, and now we'd just crash. It's possible that
> self-hosted users would see strange behaviour before, which they'd perhaps
> resolve by re-logging in, and no bugs would get filed...
Assignee | ||
Comment 4•7 years ago
|
||
Interesting read, thanks for linking Nick.
I think we have two paths we can take here.
- find out more. Why exactly are we failing to unpickle? This involves some better crash reporting (funneling our exceptions upwards), and at least one more crashy nightly cycle. This might help us unsurface some other issues, or at least get a better understanding of what's happening. But, most likely we'd still need to:
- be more lenient about this. The previous behaviour was "if we fail to unpickle, pretend account isn't there". Somehow, we'd probably need to maintain that behavior, hopefully allowing people to recover manually (disconnect/sign-in).
Hopefully "knowing more" will indicate if pushing ahead with phasing out org.json.simple is a worthy effort. Likely that involves slowly phasing out our use of EJO.
Comment hidden (mozreview-request) |
Assignee | ||
Comment 6•7 years ago
|
||
Given the low crash volume - at least it appears low - let's gather a bit more information first. Patch above "unswallows" bunch of exceptions, but only for the affected code path (e.g. unpickling initiated by a call to getAccountUID).
Comment 7•7 years ago
|
||
mozreview-review |
Comment on attachment 8908327 [details]
Bug 1399904 - Surface detailed profile unpickling exceptions
https://reviewboard.mozilla.org/r/179964/#review185170
I'm happy with this approach. Sadly, I think you're going to see a bunch of JSON parsing errors -- but if we could find somebody to help debug them, that would be great!
::: mobile/android/services/src/main/java/org/mozilla/gecko/sync/Utils.java:479
(Diff revision 1)
> *
> * @param context Android context.
> * @param filename name of file to read; must not be null.
> * @return <code>String</code> instance.
> */
> - public static String readFile(final Context context, final String filename) {
> + public static String readFile(final Context context, final String filename) throws IOException {
Only used by pickler? Looks like it.
Attachment #8908327 -
Flags: review?(nalexander) → review+
Assignee | ||
Comment 8•7 years ago
|
||
mozreview-review-reply |
Comment on attachment 8908327 [details]
Bug 1399904 - Surface detailed profile unpickling exceptions
https://reviewboard.mozilla.org/r/179964/#review185170
> Only used by pickler? Looks like it.
Yup, pickler and its tests.
Pushed by gkruglov@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/57348c4c7288
Surface detailed profile unpickling exceptions r=nalexander
Comment 10•7 years ago
|
||
bugherder |
Status: ASSIGNED → RESOLVED
Closed: 7 years ago
Resolution: --- → FIXED
Target Milestone: --- → Firefox 57
Assignee | ||
Comment 11•7 years ago
|
||
So far there haven't been any crashes reported with the more useful signatures from Comment 10. Three things come to mind:
- underlying problems are fairly rare
- we'll see a spike in crashes again when 57 goes to beta on Android, and we'll see an influx of upgrades; and again on 57 release
- we should be more lenient around handling of these problems in order to avoid said spikes. I filed Bug 1400986 to track this work
- consider making some progress towards phasing out our use of org.json.simple in the critical paths. See Comment 3
Assignee | ||
Comment 12•7 years ago
|
||
Ah, there _are_ follow-up crashes - I haven't been looking for the correct signature. Bug 1400742 is one - supposedly missing fxa.account.json.
Updated•4 years ago
|
Product: Firefox for Android → Firefox for Android Graveyard
You need to log in
before you can comment on or make changes to this bug.
Description
•