Closed
Bug 1081768
Opened 10 years ago
Closed 6 years ago
Crash with java.lang.NullPointerException @ org.mozilla.gecko.gfx.LayerView.registerCxxCompositor(LayerView.java:525) when BrowserApp is immediately displaced as foreground activity
Categories
(Firefox for Android Graveyard :: General, defect)
Tracking
(firefox35- affected, fennec+)
RESOLVED
WONTFIX
People
(Reporter: capella, Unassigned)
References
Details
(Keywords: crash, regression, topcrash-android-armv7, Whiteboard: [has bitrotted patches])
Crash Data
Attachments
(2 files, 2 obsolete files)
(deleted),
patch
|
jchen
:
review+
|
Details | Diff | Splinter Review |
(deleted),
patch
|
jchen
:
review+
|
Details | Diff | Splinter Review |
After a fresh install if I wait while FF opens on the new "Welcome" dialog, I get a NPE crash ...
https://pastebin.mozilla.org/6773505
Points basically to java.lang.NullPointerException @ org.mozilla.gecko.gfx.LayerView.registerCxxCompositor(LayerView.java:525)
If after a fresh install I simply browse really fracking quickly ( ;-) )as suggested by rnewman, all is well.
He suggests: |the firstrun experience is preventing onWindowFocusChanged from being called on BrowserApp, and so it's not performing the init that it needs to.|
And that: |the solution is probably to fix that lifecycle stuff to not rely on oWFC.|
Comment 1•10 years ago
|
||
My crash on idle on the welcome screen has been happening for a while now [1] see bug 1077858.
[1] [@ java.lang.NullPointerException: at org.mozilla.gecko.db.BrowserDB.getCount(BrowserDB.java)]
Comment 2•10 years ago
|
||
Yup, they're related -- see my analysis in Bug 1077858 Comment 1.
Blocks: firstrun
Severity: normal → blocker
tracking-fennec: --- → ?
Crash Signature: [@ java.lang.NullPointerException: at org.mozilla.gecko.gfx.LayerView.registerCxxCompositor(LayerView.java:525)]
status-firefox34:
--- → ?
status-firefox35:
--- → affected
Keywords: crash
Hardware: ARM → All
Summary: Crash on fresh nightly install ... java.lang.NullPointerException ... → Crash with java.lang.NullPointerException @ org.mozilla.gecko.gfx.LayerView.registerCxxCompositor(LayerView.java:525)
Comment 4•10 years ago
|
||
Regression from bug 888482 based on the comments in bug 1078304 which was duped to this.
Blocks: 888482
Flags: needinfo?(nchen)
Updated•10 years ago
|
Comment 5•10 years ago
|
||
(In reply to Kartikaya Gupta (email:kats@mozilla.com) from comment #4)
> Regression from bug 888482 based on the comments in bug 1078304 which was
> duped to this.
I don't believe that to be the case; it's not about GeckoThread, it's about GeckoApp.onWindowFocusChanged not being called thanks to the first run experience.
If you eliminate the first run experience, the crash goes away.
Comment 6•10 years ago
|
||
In that case bug 1078304 is not a dupe.
Comment 7•10 years ago
|
||
No, they're the same:
E/GeckoLayerView( 5618): java.lang.NullPointerException
E/GeckoLayerView( 5618): at org.mozilla.gecko.gfx.LayerView.registerCxxCompositor(LayerView.java:525)
E/GeckoLayerView( 5618): at dalvik.system.NativeStart.run(Native Method)
E/GeckoLayerView( 5618): at dalvik.system.NativeStart.run(Native Method)
but my argument is this:
The NPE is here in registerCxxCompositor:
LayerView layerView = GeckoAppShell.getLayerView();
GLController controller = layerView.getGLController(); <<<<
that means GeckoAppShell doesn't have a LayerView yet.
GAS.sLayerView is set in setLayerView, which in our case is called by GeckoApp.initializeChrome, which is called from BrowserApp's inherited initializeChrome and GeckoApp.initialize.
GeckoApp.initialize is called only from GeckoApp.onWindowFocusChanged.
Ergo, this NPE is a direct consequence of onWindowFocusChanged not being called before LayerView.registerCxxCompositor.
My hypothesis -- which capella verified in Comment 0 by tapping through quickly -- is that oWFC isn't being called until the FRE is dismissed.
Yes, there's still a race between UI code and the Gecko thread calling via JNI, but that race is non-existent in practice.
There is only one real fix for this bug: don't do this kind of work (or the DB setting in Bug 1077858) in oWFC, because it's the wrong place in the lifecycle.
Comment 8•10 years ago
|
||
Hm, ok. Thanks for the detailed explanation.
Updated•10 years ago
|
tracking-fennec: ? → +
Comment 11•10 years ago
|
||
Chenxia is going to work around this in Bug 1072831, which is tracking 34.
I'll take this and do some lifecycle analysis.
Assignee: nobody → rnewman
Status: NEW → ASSIGNED
Updated•10 years ago
|
tracking-fennec: ? → 35+
Comment 12•10 years ago
|
||
Looks like the problem is due to LayerView.registerCxxCompositor being called on the compositor thread, which does not wait for the UI to initialize on the main thread.
I think a quick bandaid would be to move the GeckoApp.intiializeChrome() call to GeckoApp.onCreate().
To actually avoid the race though, we should probably make GeckoAppShell.getLayerView() wait on the LayerView to become available.
Flags: needinfo?(nchen)
Comment 13•10 years ago
|
||
It's possible that the compositor thread's access of sLayerView occurs before it's been set by an application activity.
This patch allows for a certain amount of overlap, and lays the groundwork for different access patterns.
Note that this doesn't fix the bug in all cases -- if a user waits on the first run activity (and thus onWindowFocusChange isn't called) for more than 10 seconds + the amount of time it takes for the compositor thread to start, then they'll get an IllegalStateException.
The intention in this patch is not to paper over that issue.
Comment 14•10 years ago
|
||
This patch builds upon Part 1 by making registerCxxCompositor block forever (until interrupted) for the LayerView to become available.
This *does* paper over the user-visible bug.
Comment 15•10 years ago
|
||
Part 3 coming to address the lifecycle change. Might bump it to a different bug.
Comment 16•10 years ago
|
||
Comment on attachment 8508150 [details] [diff] [review]
Part 1: wait for LayerView to become available in registerCxxCompositor
I've verified that this builds and regular Fennec startups work fine.
Attachment #8508150 -
Flags: review?(nchen)
Updated•10 years ago
|
Attachment #8508151 -
Flags: review?(nchen)
Comment 17•10 years ago
|
||
Comment on attachment 8508150 [details] [diff] [review]
Part 1: wait for LayerView to become available in registerCxxCompositor
Review of attachment 8508150 [details] [diff] [review]:
-----------------------------------------------------------------
We should be careful about making getLayerView wait. setLayerView is called on the UI thread, and if something calls getLayerView on the UI thread before setLayerView, it's going to have a bad time. How about we have a separate method, say waitForLayerView, that specifically waits for the layer view to become available? In that case, only LayerView.registerCxxCompositor would call waitForLayerView.
::: mobile/android/base/GeckoAppShell.java
@@ -97,5 @@
> import android.media.MediaScannerConnection.MediaScannerConnectionClient;
> import android.net.ConnectivityManager;
> import android.net.NetworkInfo;
> import android.net.Uri;
> -import android.os.Bundle;
GeckoAppShell is using Bundle now after bug 1043457 landed.
@@ +313,5 @@
> + return;
> + }
> + sLayerView = lv;
> + } finally {
> + if (lv != null) {
I don't think lv can ever be null?
@@ +337,5 @@
> + return sLayerView;
> + }
> +
> + try {
> + sLayerViewMonitor.wait(LAYER_VIEW_AVAILABILITY_MSEC);
Note that the wait time is not guaranteed because of spurious wakeups.
Attachment #8508150 -
Flags: review?(nchen) → feedback+
Comment 18•10 years ago
|
||
(In reply to Jim Chen [:jchen] from comment #17)
> In that case, only LayerView.registerCxxCompositor
> would call waitForLayerView.
Fair point.
> GeckoAppShell is using Bundle now after bug 1043457 landed.
Yeah, I got a conflict just now when re-applying.
> I don't think lv can ever be null?
Why not? It's the argument. It's *not* null now, but it could be.
The point of that check is just to make sure we don't notify a waiter for a null value.
> Note that the wait time is not guaranteed because of spurious wakeups.
Yup, see the comment that snuck into Part 2.
Comment 19•10 years ago
|
||
Adds a second code path to wait, but synchronizes the current getLayerView.
Attachment #8508301 -
Flags: review?(nchen)
Updated•10 years ago
|
Attachment #8508150 -
Attachment is obsolete: true
Updated•10 years ago
|
Attachment #8508151 -
Attachment is obsolete: true
Attachment #8508151 -
Flags: review?(nchen)
Updated•10 years ago
|
Attachment #8508301 -
Flags: review?(nchen) → review+
Comment 21•10 years ago
|
||
Comment on attachment 8508303 [details] [diff] [review]
Part 2: make JNI calls to retrieve the layer block for longer. v2
Review of attachment 8508303 [details] [diff] [review]:
-----------------------------------------------------------------
When waiting indefinitely, we should check that | sLayerView != null | before returning, and wait again otherwise.
Actually, I'd rather not have a timeout at all (i.e. waitForLayerView always waits indefinitely). That should simplify things and you can put the .wait() call inside a loop that checks for | sLayerView != null |. Using a loop also guards against spurious wakeups, which are unpredictable. If waiting indefinitely causes a hang, something is definitely broken, and our hang detection is robust enough to let us know.
That's just my 2 cents; it's your call.
Attachment #8508303 -
Flags: review?(nchen) → review+
Comment 22•10 years ago
|
||
(In reply to Jim Chen [:jchen] from comment #21)
> That's just my 2 cents; it's your call.
Thanks for the input. I'll do some testing with infinite looping, see if anything seems bothersome.
Comment 23•10 years ago
|
||
I have this staged to land with infinite looping. Note that it will hang on startup without Bug 1085591, so gotta wait for review for that before I pull the trigger.
Comment 24•10 years ago
|
||
Comment 25•10 years ago
|
||
This looks healthy on Try apart from reftests, which seem to be doing something silly:
[Exception... "Component returned failure code: 0x80004005 (NS_ERROR_FAILURE) [nsIDOMWindowUtils.layerManagerType]" nsresult: "0x80004005 (NS_ERROR_FAILURE)" location: "JS frame :: chrome://reftest/content/reftest.jsm :: BuildConditionSandbox :: line 608" data: no]
Comment 26•10 years ago
|
||
Adding the signature from bug 1078304 since it was duped here.
Crash Signature: [@ java.lang.NullPointerException: at org.mozilla.gecko.gfx.LayerView.registerCxxCompositor(LayerView.java:525)] → [@ java.lang.NullPointerException: at org.mozilla.gecko.gfx.LayerView.registerCxxCompositor(LayerView.java:525)]
[@ mozilla::widget::android::GLController::CreateEGLSurfaceForCompositorWrapper()]
Comment 27•10 years ago
|
||
Is this fixed via bug 1072831?
Comment 28•10 years ago
|
||
It'll be worked around by that bug. It'll be solved as part of Bug 1085591.
Comment 29•10 years ago
|
||
interim, can confirm no more crashing on start pane idle on trunk 10/29
Comment 30•10 years ago
|
||
Gently morphing this.
tracking-fennec: 35+ → 36+
status-firefox34:
unaffected → ---
status-firefox35:
affected → ---
status-firefox36:
affected → ---
Summary: Crash with java.lang.NullPointerException @ org.mozilla.gecko.gfx.LayerView.registerCxxCompositor(LayerView.java:525) → Crash with java.lang.NullPointerException @ org.mozilla.gecko.gfx.LayerView.registerCxxCompositor(LayerView.java:525) when BrowserApp is immediately displaced as foreground activity
Comment 31•10 years ago
|
||
Let's reconsider this in triage. It's something I want to fix, but we worked around it for now, and I don't think I have time to do it in 36.
tracking-fennec: 36+ → ?
Updated•10 years ago
|
tracking-fennec: ? → +
Comment 32•10 years ago
|
||
[Tracking Requested - why for this release]: this is the #1 topcrash in Fennec 35 -- it's also a startup crash and is likely preventing people from using Beta (if not driving users away altogether).
Fennec 35b2: 441 / 2524 crashes
Fennec 35b1: 1454 / 9257 crashes
Combined: 1895 / 11781 (16%)
I think this needs to be prioritized if it's not already being worked on.
status-firefox35:
--- → affected
tracking-firefox35:
--- → ?
Comment 33•10 years ago
|
||
How is this being worked around? The crash volume is lower from b1 to b2 so is that because of workaround or because we've lost people?
Flags: needinfo?(rnewman)
Flags: needinfo?(anthony.s.hughes)
Comment 34•10 years ago
|
||
(In reply to Lukas Blakk [:lsblakk] use ?needinfo from comment #33)
> How is this being worked around? The crash volume is lower from b1 to b2 so
> is that because of workaround or because we've lost people?
When I had posted comment 32 we had more data on 35b1 because we had only just released 35b2.
Here is the current breakdown by product for the last 7 days:
> FennecAndroid 35.0b2 59.37% 1533 crashes
> FennecAndroid 35.0b1 26.22% 677 crashes
> FennecAndroid 35.0b4 8.29% 214 crashes
> FennecAndroid 36.0a2 3.56% 92 crashes
> FennecAndroid 37.0a1 2.13% 55 crashes
> FennecAndroid 36.0a1 0.23% 6 crashes
> FennecAndroid 35.0a2 0.19% 5 crashes
We currently sit at 62717 ADIs across all Fennec Beta 35 versions, up from 57572 on December 15th. Crash rate is still significant but we're probably not losing a lot of people because of it.
Flags: needinfo?(anthony.s.hughes)
Comment 35•10 years ago
|
||
The workaround is that we no longer provoke this bug from the first-run experience, so we're back to the background hum of crashes.
Remaining crashes are likely to be due to app switching/notifications/etc -- ordinary Android lifecycle stages that we don't handle correctly.
This bug will suddenly become high-priority again as soon as we want to launch an activity within a minute of launching Firefox.
Flags: needinfo?(rnewman)
Comment 36•10 years ago
|
||
Given that, not tracking for 35 - we're not likely to cram in a fix for this at this stage. Since it's tracking-fennec ? it appears that this will stay on mobile team's radar and get prioritized later.
Updated•9 years ago
|
Assignee: rnewman → nobody
Status: ASSIGNED → NEW
Whiteboard: [has bitrotted patches]
Updated•9 years ago
|
Crash Signature: [@ java.lang.NullPointerException: at org.mozilla.gecko.gfx.LayerView.registerCxxCompositor(LayerView.java:525)]
[@ mozilla::widget::android::GLController::CreateEGLSurfaceForCompositorWrapper()] → [@ java.lang.NullPointerException: at org.mozilla.gecko.gfx.LayerView.registerCxxCompositor(LayerView.java:525)]
[@ mozilla::widget::android::GLController::CreateEGLSurfaceForCompositorWrapper()]
[@ java.lang.NullPointerException: at org.mozilla.gecko.g…
Comment 37•6 years ago
|
||
Closing because no crash reported since 12 weeks.
Status: NEW → RESOLVED
Closed: 6 years ago
Resolution: --- → WONTFIX
Assignee | ||
Updated•4 years ago
|
Product: Firefox for Android → Firefox for Android Graveyard
You need to log in
before you can comment on or make changes to this bug.
Description
•