Closed Bug 1532386 Opened 6 years ago Closed 6 years ago

Rendering goes dead (page is blank on all tabs), must kill app to get it working again.

Categories

(Core :: Graphics, defect, P2)

66 Branch
ARM
Android
defect

Tracking

()

RESOLVED DUPLICATE of bug 1529892
Tracking Status
firefox65 --- unaffected
firefox66 + wontfix
firefox67 + fixed
firefox68 --- fixed

People

(Reporter: dveditz, Assigned: botond)

References

(Regression)

Details

(Keywords: regression, Whiteboard: [geckoview:p2])

Attachments

(2 files)

Firefox for Android Beta 66.0b11 (but it's been happening for a week or two) on a Samsung Galaxy s7

Occasionally browser rendering will go black. The browser UI still works and I can switch tabs or try to load new pages, but they are all also black. I think the browser itself is working: when I load new pages the progress bar acts like normal and the tracking protection shield shows up (or not) appropriately for the site, but the web display area stays black.

When I get into that state I have to kill the app and restart to get it working.

I have not been able to reproduce this at will. The usual scenario:

  • I'm deep into my twitter feed
  • I click on an image to see the larger lightboxed view
  • I try to zoom in on part of it
    • at some point zooming gets "jumpy". As I zoom in the image jumps to a different part of the image, and I have to zoom out and try to scroll back to the part I wanted, zooming in may make it jump again.
    • sometimes dragging stops working when the image has been zoomed bigger than the browser width. I have to pinch a little (so it starts shrinking) while dragging to get it to move
    • at some point it will go black and that's it.

Do we have a separate rendering process on Android? The behavior feels as if that part just crashed or hung and didn't come back. No crashes are reported to crash-stats, so this could be happening to lots of people and we wouldn't know.

Maybe something is running out of memory? I don't think twitter actually navigates when it shows the image (pushstate probably), and if I've left the image zoomed when I go back (I usually hit the back button on the phone rather than find the X on the image) then my whole feed is zoomed. If it's zooming the whole feed behind the visible image that's a lot of calculating.

More details: I switched twitter from "night mode" to normal white background. When the rendering goes dead the webview goes pure white. The image I was zooming was not white, and twitter's lightboxing of images had the same dark background it did in night mode. Other tabs are all white (or all black before) regardless of their contents prior to getting into this stuck state.

The new tab page is visible and completely functional. When I'm stuck I can add a new tab and load a page (I clicked one of the pocket links). After "loading" the webview is blank (white, previously black) but if I open the tab selector page I see a correct thumbnail for the just-loaded page. Layout is working fine, it's just not drawing

Summary: Rendering goes dead (webview is black on all tabs), must kill app to get it working again. → Rendering goes dead (webview is blank on all tabs), must kill app to get it working again.

I just started noticing this in the past week or so (beta 66), didn't experience this in 65. I don't know how many people use Firefox to view social sites rather than use the dedicated apps, but considering one selling point for doing so is better privacy/more control it seems to be a good fit with our target audience which would make this a very annoying bug.

Note that bug 1532577 describes very similar symptoms even in 65, though of course there could still be a regression here that made things worse.

Sorina, can you or one of your team try to reproduce with logcat going to see if you can get better information?
We can also follow up to check the Play Store dashboard for ANR rate in beta.
Will, is there any way to check this in telemetry as well?

Flags: needinfo?(wlachance)
Flags: needinfo?(sorina.florean)

(In reply to Liz Henry (:lizzard) (use needinfo) from comment #6)

Will, is there any way to check this in telemetry as well?

Unfortunately I can't think of a way to measure this with telemetry. :( But I am not an expert in this domain-- perhaps ask David Bolter or someone on his team.

Flags: needinfo?(wlachance)

(In reply to Liz Henry (:lizzard) (use needinfo) from comment #6)

Sorina, can you or one of your team try to reproduce with logcat going to see if you can get better information?
We can also follow up to check the Play Store dashboard for ANR rate in beta.
Will, is there any way to check this in telemetry as well?

For the play store dashboard, I don't see data specific for b11. Over the 66 beta cycle, there are a few issues at the top of the ANR list:

*Input dispatching timed out (Waiting to send non-key event because the touched window has not finished processing certain input events that were delivered to it over 500.0ms ago. Wait queue length: 9. Wait queue head age: 25492.0ms.)
*Broadcast of Intent { act=android.intent.action.SCREEN_OFF flg=0x50000010 launchParam=MultiScreenLaunchParams { mDisplayId=0 mFlags=0 } (has extras) }

The second one on that list seems to have all Samsung devices at the top of the affected list.

Would this show up as an ANR, though? The bug description so far only mention web content failing to render, but not the whole app becoming unresponsive.

Needinfo on Andrei to take a look at this bug. Thanks!

Flags: needinfo?(sorina.florean) → needinfo?(andrei.bodea)

I was able to reproduce this issue exactly as described in Comment 0 with a Google Pixel 3XL(Android P) on the following builds: Latest Nightly build 67.0a1 and Latest Beta build 66.0b13.
But I manage to reproduce the issue only after double tapping few times on a zoomed in image.
Exactly when the white screen was displayed the following error was displayed in console:
4637-4657/? E/GeckoConsole: [JavaScript Error: "1551887615883 Telemetry::CoveragePing ERROR no endpoint base set" {file: "resource://gre/modules/Log.jsm" line: 679}] append@resource://gre/modules/Log.jsm:679:12 log@resource://gre/modules/Log.jsm:360:16 error@resource://gre/modules/Log.jsm:368:10 startup@resource://gre/modules/CoveragePing.jsm:40:11 setupTelemetry/this._delayedInitTask<@resource://gre/modules/TelemetryController.jsm:719:30
Here you can find the logcat and the video.

And partially I was able to reproduce the issue on the following devices: Samsung Galaxy S7(Android 7.0), Samsung Galaxy Note 9(Android 8.1.0) After zooming on a part of an image and pinching a little the image stats shrinking and turn black, but after any action for example: pinch, zoom in/out, moving the image everything gets back to normal.
I don't really have to restart/force quit the Fennec in order to bypass the issue, it's just a black screen displayed instead of the zoomed picture.
I tried every single step from Comment 0, Comment 1, Comment 2 and Comment 3 but this is the only way I've managed to reproduce the issue on this devices.
Daniel you mentioned in Comment 1 that after tapping on the device back button in order to reach the feeds everything is zoomed in, can you please check the following: tap on 3 dots menu -> settings -> accessibility -> check if Always enable zoom is ON of OFF? maybe this is causing the issue regarding the feeds zoomed in menu issue.

Note that only on the Google Pixel 3XL (Android P) I was able to reproduce the issue fully and on Samsung Galaxy S7(Android 7.0), Samsung Galaxy Note 9(Android 8.1.0) partially as described above.
Every time when I opened a new page/tab on Google Pixel 3XL was white no matter what website I opened until force quit/restart.
I hope the logcat will bring some good new information regarding this issue.

Thanks,
Andrei

Flags: needinfo?(andrei.bodea)

[Tracking Requested - why for this release]:

Tracking for 67 until we know more about the issue, get a regression window and can confirm that the issue also affects 67.

Andrei or Laurentiu, can you find a regressionwindow? Thanks

Flags: needinfo?(andrei.bodea)

"Webview" is throwing me off... tweaking summary.

Summary: Rendering goes dead (webview is blank on all tabs), must kill app to get it working again. → Rendering goes dead (page is blank on all tabs), must kill app to get it working again.
Flags: needinfo?(laurentiu.apahidean)

Hello, I could not find a regression-window as I was able to reproduce this issue with the FF 51.0 build.
Note that the issue is not very easy to reproduce, as I described in Comment 11 where I said that I also reproduced it on the Nightly 67.0a1 build.

Flags: needinfo?(andrei.bodea)
Flags: needinfo?(laurentiu.apahidean)

Calling this P1 now for investigation.

Priority: -- → P1

Is this something SV can debug?

Flags: needinfo?(sarentz)

Sending to the Graphics Bugzilla component because James says this looks like an OOM from Adreno memory fragmentation.

Component: General → Graphics
Flags: needinfo?(sarentz)
Priority: P1 → --
Product: Firefox for Android → Core
Whiteboard: [geckoview] → [geckoview:p2]
Version: Firefox 66 → 66 Branch
Attached video 2019_03_26_00_43_12.mp4 (deleted) —

Hey Jamie - would you be able to take a look at this and let me know if you can repro and track down a potential cause? Then we can determine where to fit a potential fit in with our priorities

Flags: needinfo?(jnicol)

Botond, do you have any thoughts?

No thoughts beyond comment 19.

That was quick! :)

The symptoms do indeed sound like we're running out of graphics memory. I can't reproduce this at all though. Twitter doesn't even let me zoom very far in to images, and none of the symptoms appear for me.

Daniel, could you set layers.dump=true in about:config, and attach the logcat while reproducing?

Flags: needinfo?(jnicol) → needinfo?(dveditz)
Attached file display list + layer dump (deleted) —

If I enable "always enable zoom" in the settings then I can reproduce. Here is the display list and layer dump.

Layer 0xb42bf800 seems to be the main culprit. It has a valid region of w=10465, h=18672 which is far too large. I need to figure out why we're allowing that.

Flags: needinfo?(dveditz)

Is Bug 1529892 related?

It has the same end result, but the cause might be different.

Assignee: nobody → botond
Priority: -- → P2

Why does the Action Bar pop up?

Flags: needinfo?(aethanyc)

Our current theory is that the OOM is due to the fact that we now size position:fixed elements to the layout viewport, while continuing to paint the entire element (which would make this a regression from bug 1465616). At high zoom levels, that can be a very large area compared to the visual viewport (screen size). We are addressing this in bug 1529892, which will give position:fixed elements a "displayport", similar to scrollable content.

Depends on: 1529892

I'm going to clear the regressionwindow-wanted flag as one has been proving tricky to get (comment 16), and we have a suspected regressing bug.

Regressed by: 1465616

(In reply to Daniel Veditz [:dveditz][back Apr 15] from comment #3)

I just started noticing this in the past week or so (beta 66), didn't experience this in 65.

The regressing bug identified above landed in 63. However, 66 may have aggravated this bug by increasing the layout viewport size on some pages (bug 1423013 and related bugs).

(In reply to csheany from comment #30)

Why does the Action Bar pop up?

Are you referring to the video in Comment 21? It looks like a desktop browser to me, but there's an ActionBar? Anyway, on Android, the action bar could pop up if the selection is changed because of user interaction or javascript.

Flags: needinfo?(aethanyc)

Thank you for your respone.

That is what I was talking about.

This was with Fennec on a tablet.

It seems to trigger one aspect however doesn't always occur.

https://mobile.twitter.com/mnoorenberghe/status/1108447892702789632/photo/1

Should it be a seperate bug?

Botond, this is tracking 67 which goes to release in about 2 weeks. Any further updates on this?

Flags: needinfo?(botond)

I have a candidate fix for bug 1529892 (which we are hoping will fix this). A Try run of the patches is linked from bug 1529892 comment 3; it's not posted for review yet, because there's a test failure that I still need to sort out.

I hope to wrap that up this week, but I'm not optimistic about the chances of the fix being safe enough to uplift to 67.

It's worth noting that if our theory about bug 1529892 being the cause is correct, this regression has been shipping since 63 (though, as mentioned in comment 33, bug 1423013 probably made it worse in 66).

Flags: needinfo?(botond)

Daniel, could you retest this in the latest nightly (which should contain the fix for bug 1529892) and see if the problem still occurs?

Flags: needinfo?(dveditz)

I'm not Daniel but I am still able to reproduce.

(In reply to csheany from comment #39)

I'm not Daniel but I am still able to reproduce.

On which site?

(In reply to csheany from comment #41)

From Comment 20

I don't see what relation the tweet linked from comment 20 has to this bug. The tweet is talking about a devtools feature.

The behavior in the video I posted is with that link

(In reply to Botond Ballo [:botond] from comment #42)

I don't see what relation the tweet linked from comment 20 has to this bug. The tweet is talking about a devtools feature.

Oh, I see: the point is not the content of the tweet, but rather the behaviour of the Android browser when viewing that tweet. That was not clear to me :)

Anyways, assuming comment 21 illustrates the behaviour when viewing the tweet, it doesn't seem to be triggering this bug, which involves the rendering going black.

In fact, there is no indication of any rendering problem in the comment 21 video. It just seems to illustrate zooming in to a point where you can only see the page's background.

So you're saying I should file a seperate bug :)

Can you recommend a summary?

Initially the jumpy zooming seemed to occur when that happened.

After testing the link in Comment 35 it happens there as well.

(In reply to csheany from comment #45)

So you're saying I should file a seperate bug :)

Can you recommend a summary?

I'm not yet sure what problem you have in mind.

Initially the jumpy zooming seemed to occur when that happened.

Now that you said "jumpy zooming", I'll venture a guess: is the problem you have in mind, that when you're zooming in on a Twitter picture, the zoom level increases in discrete chunks, rather than smoothly?

(If that's what you have in mind, I hope you can understand that this wasn't at all obvious from just the video in comment 21. While I did see that in the video, for all I knew that could have been an artifact of a low frame rate in the video capturing software, and not a reflection of what happens on the actual device. I'll repeat what I said elsewhere: pictures and videos are helpful to illustrate a written description of a problem, but the written description is still important.)

Anyways, if that's the issue you have in mind, please do file a separate bug.

Jumpy zooming was from the description (I will post a video soon)

The behavior in Comment 21 only requires a single tap which is what makes it so strange.

That said, given Bug 1529892 I'm not sure it is still OOM'ing.

To clarify in the meantime, I am noticing two different results...

  1. Zoom in, the content doesn't stay still

  2. Tap, the content moves forward

The video in comment 21 was not what I saw -- your video still had the "x" to close there. Mine went blank except for the chrome at top: that still worked.

FWIW I usually run Beta, not nightly, and I haven't had this problem lately. I do now see a problem where I zoom in and then the image keeps zooming after I stop (and sometimes keeps zooming in and "drifting" even after I've managed to zoom back out a little). A couple of times I crashed the whole browser hard while trying to reproduce this (nothing submitted according to about:crashes). Could be a different symptom of the same underlying cause? But I haven't seen just the web content go out to lunch in a while.

I'll try nightly and see what happens.

(In reply to csheany from comment #48)

To clarify in the meantime, I am noticing two different results...

  1. Zoom in, the content doesn't stay still

  2. Tap, the content moves forward

It's worth noting that Twitter implements its own zooming in this image view. This is likely to be an artifact of their zooming code, or, if you've checked "Always enable zoom", an interaction between their zooming and ours. I would suggest taking discussion of this to a new bug.

(In reply to Daniel Veditz [:dveditz] from comment #49)

I do now see a problem where I zoom in and then the image keeps zooming after I stop (and sometimes keeps zooming in and "drifting" even after I've managed to zoom back out a little).

It's hard to be sure, but this too could be an artifact of Twitter trying to do its own zooming in this view.

A couple of times I crashed the whole browser hard while trying to reproduce this (nothing submitted according to about:crashes).

I don't suppose there's anything resembling a useful backtrace in logcat in a non-debug build?

And the summary is...

I cannot reproduce neither the full-app crash nor the drawing-is-dead (gecko crashed?) effect in Nightly. I also cannot zoom in nearly as much--was that the "fix"?

Flags: needinfo?(dveditz)

(In reply to Daniel Veditz [:dveditz] from comment #53)

I cannot reproduce neither the full-app crash nor the drawing-is-dead (gecko crashed?) effect in Nightly. I also cannot zoom in nearly as much--was that the "fix"?

No, the fix should not have affected how far you can zoom in. Perhaps you went from testing with "Always enable zoom" (in Settings -> Accessibility) enabled, to disabled?

Daniel, are you able to verify the fix now, or does it need more work?

Flags: needinfo?(dveditz)

Botond was right that my nightly profile did not have "Always enable zoom" turned on. Once enabled both nightly and beta could zoom equivalently. On Beta I can still reproduce a full-app crash (no longer the original symptoms of the "app" continuing to function but the browser part being gone and not restarting). On Nightly I can't make it crash or stop drawing.

It was never 100% reliable, but I've played with it much longer (multiples of the time) than it would take to cause a crash in Beta it appears to be fixed in Nightly.

Flags: needinfo?(dveditz)

Thanks Daniel.

Botond, with Daniel's test results, is it safe to mark it fixed for 68 and maybe also move the 67 tracking flag to bug 1529892 (which has the fix) and then uplift the patch there? I believe, the last beta build for 67 is targeting tomorrow.

Flags: needinfo?(botond)

Yes, I think we can close this as a duplicate of bug 1529892.

If people are experiencing issues other than OOM on Twitter's picture view (as suggested in comment 48 and comment 49), those can be triaged and investigated separately in another bug.

I discussed the possibility of an uplift with Kats and Jaime, and there seems to be consensus that, while we are late in the cycle, given the severity of the symptom being fixed an uplift request is justified.

Status: NEW → RESOLVED
Closed: 6 years ago
Flags: needinfo?(botond)
Resolution: --- → DUPLICATE
Has Regression Range: --- → yes
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: