Closed Bug 1172167 Opened 9 years ago Closed 9 years ago

[Flame] Apps are getting OOM-killed more frequently than before

Categories

(Firefox OS Graveyard :: General, defect, P2)

ARM
Gonk (Firefox OS)
defect

Tracking

(b2g-master affected)

RESOLVED WONTFIX
Tracking Status
b2g-master --- affected

People

(Reporter: njpark, Unassigned)

References

Details

(Keywords: regression, smoketest, Whiteboard: fbimage [systemsfe][319MB-Flame-Support])

Attachments

(1 file)

STR: Open Gallery app, go to homescreen, open contacts, go back to homescreen Long press home button to activate card view Expected: shows Gallery and Contacts card Actual: Shows the most recent app, or does not show any card at all. Logcat: https://pastebin.mozilla.org/8835916 Version Info: uild ID 20150605010203 Gaia Revision 65369b217faac7d70c1a29100c4208c6d1db16e3 Gaia Date 2015-06-04 20:28:16 Gecko Revision https://hg.mozilla.org/mozilla-central/rev/0496b5b3e9ef Gecko Version 41.0a1 Device Name flame Firmware(Release) 4.4.2 Firmware(Incremental) eng.cltbld.20150605.043730 Firmware Date Fri Jun 5 04:37:39 EDT 2015 Bootloader L1TC000118D0
fyi, this might cause intermittent failures on cards_view/test_cards_view_with_two_apps.py
Flags: needinfo?(martijn.martijn)
Whiteboard: fbimage
I've been in the cards view, on master, on my Flame most of this week and I've not seen this. I just double-checked on current master with the STR and cannot reproduce. Are the apps being killed maybe, is there anything in logcat or the console? Can you edge gesture to them or spot them in the System app's DOM tree (#windows > .appWindow)?
Component: Gaia::System → Gaia::System::Task Manager
Whiteboard: fbimage → fbimage [systemsfe]
I checked with b2g-ps, and it looks like the app is getting killed at random times after it is set to background, probably caused by lmk. It either gets kills when going back to the homescreen, or launching a new app. I also noticed this today though. My configuration is 319MB flame running eng build, but haven't come across this before. Below is when I was launching Contacts app after coming back from the gallery app: APPLICATION SEC USER PID PPID VSIZE RSS WCHAN PC NAME b2g 0 root 13685 1 301928 46796 ffffffff b6f20894 S /system/b2g/b2g (Nuwa) 0 root 13696 13685 98240 5436 ffffffff b6f20894 S /system/b2g/b2g OperatorVariant 2 u0_a13939 13939 13696 114400 7900 ffffffff b6f20894 S /system/b2g/b2g Gallery 2 u0_a20368 20368 13696 130644 25396 ffffffff b6f20894 S /system/b2g/b2g Homescreen 2 u0_a20539 20539 13696 182572 43508 ffffffff b6f20894 S /system/b2g/b2g (Preallocated a 2 u0_a20731 20731 13696 112888 21252 ffffffff b6f20894 S /system/b2g/b2g No-Juns-Mac-Pro:~ g5njpark$ adb shell b2g-ps APPLICATION SEC USER PID PPID VSIZE RSS WCHAN PC NAME b2g 0 root 13685 1 305688 64448 ffffffff b6f20894 S /system/b2g/b2g (Nuwa) 0 root 13696 13685 98240 5116 ffffffff b6f20894 S /system/b2g/b2g OperatorVariant 2 u0_a13939 13939 13696 114400 7804 ffffffff b6f20894 S /system/b2g/b2g Homescreen 2 u0_a20539 20539 13696 152600 28296 ffffffff b6f20894 S /system/b2g/b2g Communications 2 u0_a20731 20731 13696 126688 28688 ffffffff b597dfcc R /system/b2g/b2g Just curious, does the VSIZE of b2g look normal? because following is the output of the build from a few days ago: (and this bug does not occur:) APPLICATION SEC USER PID PPID VSIZE RSS WCHAN PC NAME b2g 0 root 207 1 282236 74312 ffffffff b6ee3894 S /system/b2g/b2g (Nuwa) 0 root 459 207 98116 7404 ffffffff b6ee3894 S /system/b2g/b2g Homescreen 2 u0_a949 949 459 154400 28292 ffffffff b6ee3894 S /system/b2g/b2g Gallery 2 u0_a1595 1595 459 134400 25184 ffffffff b6ee3894 S /system/b2g/b2g (Preallocated a 2 u0_a1653 1653 459 112892 22196 ffffffff b6ee3894 S /system/b2g/b2g
Flags: needinfo?(sfoster)
(In reply to Martijn Wargers [:mwargers] (QA) from comment #4) > Created attachment 8616477 [details] > Pull request - Bug 1172167 - Disable these Cards view Gaia UI tests for > Flame device Merged: https://github.com/mozilla-b2g/gaia/commit/0deeca296a9530bbfb683ec90d3d28117f33bca8 Sam, these tests should be able to run with 319MB on the Flame device. If those apps are killed in the background in current builds, I guess that means there is some kind of memory increase regression going on.
Keywords: smoketest
I think the oom-ing is also the reason for bug 1172212.
Blocks: 1172343
(In reply to No-Jun Park [:njpark] from comment #3) > I checked with b2g-ps, and it looks like the app is getting killed at random > times after it is set to background, probably caused by lmk. It either gets > kills when going back to the homescreen, or launching a new app. I'm in the middle of getting a new B2G build environment set up to get access to the reference workloads. I'm on freshly flashed 319MB flame and with no data to speak of in Gallery or Contacts, this does not reproduce. If anyone else can confirm this is memory-related in the meantime, that would be great.
Flags: needinfo?(sfoster)
Depends on: 1162535
Adding qawanted to repro this behavior on 319MB flame. Doesn't have to be exactly Contacts and Gallery. Try to see whether the card view 'forgets' any recently opened app when it is returned to the Homescreen.
Keywords: qawanted
I can reproduce this issue, though not every time. The root cause is still memory regression on v3.0 (bug 1162535). This issue has a higher reproduction rate on engineering builds because on eng it has more apps on Homescreen. It will also have a higher repro rate if the app being opened uses more memory, like the Gallery app (where it has some pictures in it) or Camera app. Device: Flame (user/eng build, 319MB, KK) BuildID: 20150608010204 Gaia: 1d62b32408567f9f7cf1c71c1e5a0c6593be757b Gecko: 7d4ab4a9febd Gonk: 040bb1e9ac8a5b6dd756fdd696aa37a8868b5c67 Version: 41.0a1 (3.0) Firmware Version: v18D-1 User Agent: Mozilla/5.0 (Mobile; rv:41.0) Gecko/41.0 Firefox/41.0
QA Whiteboard: [QAnalyst-Triage?]
Flags: needinfo?(ktucker)
Keywords: qawanted
aha, thanks for pointing that out. Yes I was testing on the eng build as well.
QA Whiteboard: [QAnalyst-Triage?] → [QAnalyst-Triage+]
Flags: needinfo?(ktucker)
[Blocking Requested - why for this release]: Removing smoketest blocker given it's mainly happening on engineering builds and not 100%. Nominating to block release, could be considered a 'qablocker'.
blocking-b2g: --- → 3.0?
Keywords: smoketest
I'm a bit confused still though, if the apps are getting killed by the LMK (or by gecko in general), they should be in a suspended state and still on the stack (as managed by StackManager, where TaskManager gets its app list from.) That was the point of bug 941238. If we don't feature these apps in the task manager, it is a fairly pointless feature for low-memory devices as OOM with multiple apps running is the normal and expected state (memory regressions notwithstanding) So, apps getting killed more frequently by the LMK due to low-memory may well be regression, but they should continue to be represented as cards in the task manager; selecting such a card revives that app.
s/OOM/low memory/
From my limited knowledge, it looks to me that when the apps are getting killed due to low memory, it disappears from the list generated by b2g-ps or top, so I think it's really dead, rather than in a suspended state. And that's when the card of the app disappears from the list. So in theory, the card view should retain the list of all apps that I ever opened unless I delete it explicitly from the card view?
(In reply to Sam Foster [:sfoster] from comment #12) > So, apps getting killed more frequently by the LMK due to low-memory may > well be regression, but they should continue to be represented as cards in > the task manager; selecting such a card revives that app. You might want to file a new bug about that. This bug is about the apps getting killed.
We don't do regression windows on performance issues because the window will not be accurate.
QA Whiteboard: [QAnalyst-Triage+] → [QAnalyst-Triage?]
Flags: needinfo?(ktucker)
QA Whiteboard: [QAnalyst-Triage?] → [QAnalyst-Triage+]
Flags: needinfo?(ktucker)
Blocks: 1176037
Depends on: 1176596
(In reply to Martijn Wargers [:mwargers] (QA) from comment #15) > (In reply to Sam Foster [:sfoster] from comment #12) > > So, apps getting killed more frequently by the LMK due to low-memory may > > well be regression, but they should continue to be represented as cards in > > the task manager; selecting such a card revives that app. > > You might want to file a new bug about that. This bug is about the apps > getting killed. I filed bug 1176596 for it now.
(In reply to Pi Wei Cheng [:piwei] from comment #16) > We don't do regression windows on performance issues because the window will > not be accurate. This is not a performance issue, this is a memory regression with clear str (although it might not always be reproducable). I guess I should try to get a regression window here.
Works in: Build ID 20150604160205 Gaia Revision e0fbadeb78a96137f071d9be7a47ef9fe882d17f Gaia Date 2015-06-04 07:44:30 Gecko Revision https://hg.mozilla.org/mozilla-central/rev/5b4c240e1a36 Gecko Version 41.0a1 Fails in: Build ID 20150605010203 Gaia Revision 65369b217faac7d70c1a29100c4208c6d1db16e3 Gaia Date 2015-06-04 20:28:16 Gecko Revision https://hg.mozilla.org/mozilla-central/rev/0496b5b3e9ef Gecko Version 41.0a1 This was tested on a Flame with 296MB memory set.
https://hg.mozilla.org/mozilla-central/pushloghtml?startdate=2015-06-04+13%3A23%3A27&enddate=2015-06-05+01%3A29%3A27 Gecko landing don't seem to show anything that could make this happen, afaict. But this regression range is in the range when bug 1094759 landed. Alive, could you take a look at this?
Flags: needinfo?(alegnadise+moz)
(In reply to Martijn Wargers [:mwargers] (QA) from comment #20) > https://hg.mozilla.org/mozilla-central/pushloghtml?startdate=2015-06- > 04+13%3A23%3A27&enddate=2015-06-05+01%3A29%3A27 > Gecko landing don't seem to show anything that could make this happen, > afaict. > > But this regression range is in the range when bug 1094759 landed. Alive, > could you take a look at this? Alive is inactive now. Forward NI to Tim.
Flags: needinfo?(alegnadise+moz) → needinfo?(timdream)
I would advice we look into card view itself (or, app state backing the card view) directly. cards should always there even if the app is killed, that what I am told.
Flags: needinfo?(timdream)
(In reply to Tim Guan-tin Chien [:timdream] (slow response; please ni? to queue) from comment #23) > I would advice we look into card view itself (or, app state backing the card > view) directly. cards should always there even if the app is killed, that > what I am told. Tim, I've filed bug 1176596 for that. You're right that if that bug would be solved, this issue would also be fixed. But I have a question using the Flame device with 319MB memory. If the Gallery and the Contacts app is opened and then the Cards View is opened, should the Gallery and/or the Contacts app get killed on a Flame device with 319MB memory or should they always stay open? Because this is something that seems to be happening nowadays, regarding this test. It could have multiple causes, I guess. Either memory regression or more aggressive lmk or something. I also wonder if it's worth the effort to find the cause.
Flags: needinfo?(timdream)
(In reply to Martijn Wargers [:mwargers] (QA) from comment #24) > (In reply to Tim Guan-tin Chien [:timdream] (slow response; please ni? to > queue) from comment #23) > > I would advice we look into card view itself (or, app state backing the card > > view) directly. cards should always there even if the app is killed, that > > what I am told. > > Tim, I've filed bug 1176596 for that. You're right that if that bug would be > solved, this issue would also be fixed. I would say we should dup this bug there then. > But I have a question using the Flame device with 319MB memory. If the > Gallery and the Contacts app is opened and then the Cards View is opened, > should the Gallery and/or the Contacts app get killed on a Flame device with > 319MB memory or should they always stay open? > Because this is something that seems to be happening nowadays, regarding > this test. > It could have multiple causes, I guess. Either memory regression or more > aggressive lmk or something. I also wonder if it's worth the effort to find > the cause. I am not aware of if we have any rigid OOM requirement nowadays (e.g. start X apps and X - N apps should stay alive, N = ?), if so then this would become a valid bug.
Flags: needinfo?(timdream)
(In reply to Tim Guan-tin Chien [:timdream] (slow response; please ni? to queue) from comment #25) > I am not aware of if we have any rigid OOM requirement nowadays (e.g. start > X apps and X - N apps should stay alive, N = ?), if so then this would > become a valid bug. I don't know either. All I know is that in this situation, the 2 apps would stay open. So either the apps or the system are using more memory nowadays or the oom killer is more aggressive nowadays. Perhaps we're fine with that situation, I don't know.
blocking-b2g: 2.5? → ---
QA Whiteboard: [QAnalyst-Triage+] → [QAnalyst-Triage+][memory-failure]
Whiteboard: fbimage [systemsfe] → fbimage [systemsfe][319MB-Flame-Support]
[Blocking Requested - why for this release]: Breaks other Dialer bug. Nominating for 2.5
blocking-b2g: --- → 2.5?
Priority: -- → P2
[Blocking Requested - why for this release]: Can we determine if the OOM killer threshold changed or if *specific apps* regressed mem usage enough to be causing this?
Priority: P2 → --
Priority: -- → P2
(In reply to Mahendra Potharaju [:mahe] from comment #28) > [Blocking Requested - why for this release]: > > Breaks other Dialer bug. Nominating for 2.5 What other Dialer bug?
(In reply to Gregor Wagner [:gwagner] from comment #30) > (In reply to Mahendra Potharaju [:mahe] from comment #28) > > [Blocking Requested - why for this release]: > > > > Breaks other Dialer bug. Nominating for 2.5 > > What other Dialer bug? Sorry, I had placed this comment on the wrong bug in Triage, and later missed on editing it.
Not blocking on low memory use-cases. The behavior changed but it does the right thing in a low memory situation.
blocking-b2g: 2.5? → ---
This seems to have gotten worse so adding back smoketest. I can't seem to keep more than 1 app open and quite often it will even kill that 1 opened app as well.
Would a more accurate title for this bug be "Apps are getting OOM-killed more frequently than before?" This doesn't sound like a Task Manager bug.
I've updated the bug title and moved to General for re-triage. The description on this bug describes a symptom made visible in Task Manager, but the bug itself is *not yet diagnosed*. We do have a regression window in Comment #19. For the issue identified in comment #12 which is Task Manager-specific, we have bug 1176596.
Component: Gaia::System::Task Manager → General
Summary: [Flame][Cards View] Cards view loses the list of recently opened apps → [Flame] Apps are getting OOM-killed more frequently than before
This is working as expected. Several apps are using more memory and the OOK killer kicks in and does the right thing. There is nothing to fix here.
Status: NEW → RESOLVED
Closed: 9 years ago
Resolution: --- → WONTFIX
I don't see any mentions of bug 1204837, my guess is that the behavior seen here was largely caused by that bug. For what it's worth I can't repro the original STR using my Flame in the 319MiB configuration.
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: