Closed Bug 892371 Opened 11 years ago Closed 10 years ago

[Activity] Adjust OOM score of activity opener

Categories

(Firefox OS Graveyard :: General, defect, P1)

ARM
Gonk (Firefox OS)
defect

Tracking

(blocking-b2g:2.2+, firefox37 wontfix, firefox38 wontfix, firefox39 fixed, b2g-v2.2 fixed, b2g-master fixed)

RESOLVED FIXED
2.2 S8 (20mar)
blocking-b2g 2.2+
Tracking Status
firefox37 --- wontfix
firefox38 --- wontfix
firefox39 --- fixed
b2g-v2.2 --- fixed
b2g-master --- fixed

People

(Reporter: alive, Assigned: gsvelto)

References

Details

(Whiteboard: [caf priority: p2][CR 798545][TD-57881] )

Attachments

(4 files, 8 obsolete files)

(deleted), patch
gsvelto
: review+
Details | Diff | Splinter Review
(deleted), patch
gsvelto
: review+
Details | Diff | Splinter Review
(deleted), patch
Details | Diff | Splinter Review
(deleted), patch
Details | Diff | Splinter Review
See https://bugzilla.mozilla.org/show_bug.cgi?id=887650#c10
And tells gaia via system message who is the activity opener by passing manifestURL and pageURL.

Fabrice, could you take this?
Flags: needinfo?(fabrice)
Yes...
Assignee: nobody → fabrice
Flags: needinfo?(fabrice)
blocking-b2g: leo? → leo+
Priority: -- → P1
Target Milestone: --- → 1.1 QE4 (15jul)
(In reply to Fabrice Desré [:fabrice] from comment #1)
> Yes...

Are you overloaded ;)

ActivityService is written in javascript so I could help but I know nothing about adjusting OOM adj stuff in gecko now. Studying.
I think the key-point is at |dom/ipc/ProcessPriorityManager:computePriority()|
Before taking hidden state into account when computing priority,
we need to find out if current process has an opened activity process and find out the most top one,
change all the priority to higher(less) than its next activity.

So
(1) In dom/activities/src/ActivitiesService:Start/PostResult/PostError, notify the caller with 'activity-state-changed' event or sth. like this
(2) Expose activity info to ProcessPriorityManager to use in computePriority (Still don't know how now.)
Maybe bug 822325 will make this work well enough.
Alan, here is gaia WIP:
https://github.com/alivedise/gaia/commit/1679ed57c7ebae7b295635b58a784f21f99a3e42

Note: I still need gecko to tell me who is inline activity opener but I could finish this part on my own.
(In reply to Justin Lebar [:jlebar] from comment #4)
> Maybe bug 822325 will make this work well enough.

IMO 822325 is trying to give a background frame even lower priority when time goes by. If so it's irrelevant to this one.
Ah, the inline activity opener should have /higher/ priority than the inline activity itself?
(In reply to Justin Lebar [:jlebar] from comment #7)
> Ah, the inline activity opener should have /higher/ priority than the inline
> activity itself?

I discussed this with Alan. Android give 'the same' priority. Since we kill callee when caller dies I think this is acceptable.
Is there any progress on this defect?
Not yet... I will work on it this week.
Target Milestone: 1.1 QE4 (15jul) → 1.1 QE5
Removing leo+ and milestone as no longer blocker by leo.
blocking-b2g: leo+ → koi?
Whiteboard: [TD-57881] → [TD-57881]
Target Milestone: 1.1 QE5 → ---
Blocks: 898307
Alive,

Are we working on this bug for 1.2?
Flags: needinfo?(alive)
Please see bug 822325, it would offer some help to this bug.
I need alan's comment here.
Flags: needinfo?(alive)
Fabrice,

Can you please comment on next steps?
Flags: needinfo?(fabrice)
After discussing with Alan,
I still think this is not totally resolved by 822325.

822325 could only fix the case: there's only one caller and one callee.
However it's possible to have multiple callers(callee becomes caller).

In android if the process is having some relationship with the other process, the lower proirity one should be uplifted to the higher.

Fabrice, my original proposal is still:
1. Add some flag in activity caller if it's having a callee.
2. In process priority manager, check the flag before checking the background state to set OOM adj.
I think I see how to do that, but we should postpone to 1.3.
Flags: needinfo?(fabrice)
Depends on: 822325
blocking-b2g: koi? → 1.3?
Moving to 1.3
(In reply to Preeti Raghunath(:Preeti) from comment #17)
> Moving to 1.3

This isn't a committed feature for 1.3, so I'm clearing the nom.
blocking-b2g: 1.3? → ---
Blocks: 846850
Blocks: 960280
c.c. Cyu
At first thought, we can extend the BackgroundLRUPool to include foreground apps, but we do the opposite to background apps that the latest app gets killed first. Does this sound OK?
Just a reminder here, not actually related to this comment 21. We've already ran out of LMK levels (6) in b2g.js :(
Blocks: 994518
Fabrice, just wondering how to move this forward. The bug is more than 1 year old, and it was postponed several times already.

Alive says it's the blocker for bug 1072874: we don't set "visible" to false to windows that start an activity because we don't want that the background window gets OOM killed. But this causes other issues because then apps can't rely on the visibilitychange event and the document.hidden property.
Flags: needinfo?(fabrice)
(In reply to Julien Wajsberg [:julienw] from comment #23)
> Alive says it's the blocker for bug 1072874: we don't set "visible" to false
> to windows that start an activity because we don't want that the background
> window gets OOM killed. But this causes other issues because then apps can't
> rely on the visibilitychange event and the document.hidden property.

The underlying issue is that we have a tight coupling between visibility and priorities. I made a rather long post on dev-b2g about what can be done to fix the situation [1]. On that thread both Fabrice and Alive pointed out that the best way to go about this would be to have an API to explicitly set the priority of an application so that the window manager / system app is free to use the visibility information without having to worry about priorities changing.

I made a rough proposal in [2] but never got around to implementing it. I'm stuck fixing dialer blockers and I don't know when I'll be able to help with this. Though if somebody else wants to pick that up I'm ready to walk him through the necessary steps.

However the proper fix would be a rather major change requiring bits in gaia and gecko so if we need something quicker we might resort to a workaround. One that comes to mind is that we might have the activity opener grab a high-priority wakelock and keep it for the duration of the activity. This would allow the system app to flag the activity opener as hidden without causing its priority to drop too much. It's a hack, and an ugly one at that, but it should work in most scenarios IMHO.

[1] https://groups.google.com/d/msg/mozilla.dev.b2g/6vkk59Kw1Us/GJcqmJNKRe4J
[2] https://groups.google.com/d/msg/mozilla.dev.b2g/6vkk59Kw1Us/mFJh5EZTbHwJ
I'm not keen on a hack, as it may be detrimental to other efforts (like power usage). We need to find an owner for that or free Gabriele from dialer tasks.
Flags: needinfo?(fabrice)
blocking-b2g: --- → 2.2?
Blocks: 1102675
Fabrice,

Is this a fix that's realistic to commit to for 2.2? If so, who if not Gabriele can take this?

CAF wants this because the starting app in multi-app user flows can be LMK'd before other background apps i.e. On a 256 MB fxOS Phone opening Contacts then loading Camera to choose a picture causes the Contacts app to be LMK'd before other background apps.
Flags: needinfo?(fabrice)
triage:
per comment 26, user may encounter a bad situation if without the fix.
let's set the blocking decision for now.
It doesn't seem like an easy fix so NI to Gabriele. we can re-evaluate the decision.
blocking-b2g: 2.2? → 2.2+
Flags: needinfo?(gsvelto)
I was considering fixing this and bug 852925 as part of 2.2 but since I'm a dialer peer and when RTL was added to the 2.2 scope I've found myself with more work on my hands than I had anticipated so I'm not so sure of the timeline anymore.

After discussing with Alive in Portland we concluded that the "real" fix for this would be to let the system app explicitly manage process priorities (and thus OOM scores) instead of having the process priority manager second-guess them. Unfortunately such a fix would require far too long for the 2.2 scope and we were considering doing it as part of 3.0.

As a stop-gap solution I considered the following: currently the process priority manager assigns the same OOM score to all foreground applications irrespective of their behavior. This could be adjusted so that the foreground applications are kept in an LRU queue with increasing OOM scores. Activity openers would then get a higher score than the activity they've opened and so applications would be killed starting from the oldest ones which would fix the typical scenario of the three activities chain (call log -> contacts -> SMS or contacts -> SMS -> gallery).

We already have similar machinery that deals with background applications so I think that could be re-purposed relatively easy.

On my side it's just a matter of finding enough available time to do it. I'll try this week but I can't guarantee anything.
Flags: needinfo?(gsvelto)
Thanks Gabriele. The last resort option would be to turn on the dom.ipc.reuse_parent_app pref on the affected devices.
Flags: needinfo?(fabrice)
Here's a first stab at this problem. This WIP patch doesn't introduce any new logic but makes the LRU machinery we used for background processes generic so we can reuse it for foreground processes too.

This applies on top of attachment 8564152 [details] [diff] [review] which simplifies the process priority manager but I'm fairly convinced that it can also be easily coerced to apply w/o that patch.
Assignee: fabrice → gsvelto
Status: NEW → ASSIGNED
Whiteboard: [TD-57881] → [CR 798545][TD-57881]
Whiteboard: [CR 798545][TD-57881] → [caf priority: p2][CR 798545][TD-57881]
Still a WIP but his one is working correctly for background processes, the previous one was messing up the LRU queue creation.
Attachment #8567344 - Attachment is obsolete: true
This is the first working patch though it's still missing some important bits:

- There's no new tests for the added functionality, I have to add some before finalizing this.

- I'm not very satisfied with how the patch is structured, I think that I can make the changes to the process priority manager less verbose and a little bit more self-contained.

- The way LRU indexes are assigned right now is not optimal for multiple foreground tasks created by an activity chain. The most recent task will get the base oom_score_adj, the following two will get oom_score_adj + 67. Ideally we'd like the first three foreground processes to have three different oom_score_adj values of increasing magnitude. Having the second and third task have the same oom_score_adj value doesn't solve the typical problem of three chained activities with the third one being a gallery/music picker (e.g. go into contacts, choose a contact, tap to send an SMS -> in the SMS app tap to add a picture -> gallery picker).
Attachment #8568556 - Attachment is obsolete: true
Blocks: 1119459
Gabriele,

How's this progressing? Any sense of when you'll be able to complete this?

Thanks,
Mike
Flags: needinfo?(gsvelto)
Things are shaping up nicely, I'll attach a new version of the patch shortly. It's still not complete yet but it only needs a couple of final touches which I hope to finish by tomorrow.
Flags: needinfo?(gsvelto)
Another iteration of the patch but still a WIP. Code is a lot cleaner now both background and foreground processes are being handled correctly but there's still a bug I need to iron out: when a process is removed from an LRU queue the remaining elements' oom_score_adj value is not dialed back which is kinda fine for background process but doesn't work properly for foreground ones.
Attachment #8570510 - Attachment is obsolete: true
This patch reworks the pseudo-LRU machinery we have in the process priority manager so that it can be used for foreground tasks too. Here's a little bit of background and an explanation of the various changes to help with the review:

Previously we would keep background task in a pseudo-LRU queue made of groups of increasing, power-of-2 sizes starting from 1. This was used to modify the oom_score_adj score of each process so that the oldest one would get killed earlier than the youngest ones. The reason why we didn't use a real LRU queue is that we didn't want to continuously change the oom_score_adj score of each process when a new one was added. The groups worked well enough and greatly reduced the number of adjustments we had to make.

The same basic machinery was reused to deal with multiple foreground processes being present at the same time. However the policy we used for background process wasn't a perfect fit. In the background case the first LRU group would hold one process, then two, then four, etc... For foreground processes we want the first two processes to be in the same group (i.e. have the same oom_score_adj) because they're most likely an activity opener and the opened activity and they shouldn't be prioritized over each other. On the other hand starting from the third process we want to increase the oom_score_adj to properly deal with chains of three activities which are rather common (contacts -> SMS -> gallery is one example). To cope with this requirement I've modified the LRU logic to take a bias factor that I then used to adjust it for the two different classes of processes.

The LRU pool was represented by a singleton (BackgroundProcessLRUPool) which lived outside of the process priority manager and had a roundabout way of dealing with processes because of this. I've made this class generic to use it for both background and foreground processes and made its instances contained in the process priority manager. This simplifies the code as now all priority adjustment operations are done using ParticularProcessPriorityManager objects and not a mix of those and ContentParent depending on the context. I could also remove some public methods from the process priority manager which existed only because the BackgroundProcessLRUPool class needed to access them.

I've also changed the naming of all the related functions to reflect the more generic nature of the LRU machinery.

It's a big patch but fortunately it removes more lines than it adds and I think that it makes the relevant code more self-contained which is a good thing. The tests were also adjusted by I didn't have time yet to add a new test for the added functionality. I'll write one soon so that we'll land them together in this bug.

As usual I'm asking for review to Dave for the HAL bits and to Kyle for the DOM ones.
Attachment #8572913 - Attachment is obsolete: true
Attachment #8573575 - Flags: review?(khuey)
Attachment #8573575 - Flags: review?(dhylands)
Comment on attachment 8573575 [details] [diff] [review]
[PATCH] Adjust oom_score_adj values for foreground processes according to an LRU policy

Review of attachment 8573575 [details] [diff] [review]:
-----------------------------------------------------------------

Just some minor stuff.

::: dom/ipc/ProcessPriorityManager.cpp
@@ +43,5 @@
>  // priority manager.
>  //
>  // (Wow, our logging story is a huge mess.)
>  
> +#define ENABLE_LOGGING 1

Was this left on accidentally?

@@ +553,2 @@
>    }
>  

nit: shouldn't there be an else here? (I'm assuming that it isn't possible to add and remove a process from the background pool)

@@ +561,5 @@
> +  if (newPriority == PROCESS_PRIORITY_FOREGROUND &&
> +      aOldPriority != PROCESS_PRIORITY_FOREGROUND) {
> +    mForegroundLRUPool.Add(aParticularManager);
> +  }
> +

nit: and here?

@@ +986,5 @@
> +  if ((mPriority == aPriority) && (mLRU != aLRU)) {
> +    mLRU = aLRU;
> +    hal::SetProcessPriority(Pid(), mPriority, aLRU);
> +
> +    nsPrintfCString ProcessPriorityWithLRU("%s:%d",

nit: ProcessPriorityWithLRU is the name of a variable, so shouldn't it start with a lowercase letter?

@@ +1177,5 @@
> +  , mBias(aBias)
> +{
> +  // We set mLRUPoolLevels according to our pref.
> +  // This value is used to set background process LRU pool
> +  const char *str = ProcessPriorityToString(aPriority);

nit: space after *

@@ +1212,5 @@
> +  // (End of buffer)
> +
> +  // Biasing the input can be used to shift the assignment
> +
> +  uint32_t level = (uint32_t)(log((float)aLRUPoolIndex) / log(2.0));

Another thing that would work is to use the frexp function. The exponent returned is the power of 2 exponent, which is what you want for level here.

@@ +1226,2 @@
>  
> +  if (index != nsTArray<ParticularProcessPriorityManager*>::NoIndex) {

nit: flip logic and return early
Attachment #8573575 - Flags: review?(dhylands) → review+
Thanks for the review!

(In reply to Dave Hylands [:dhylands] from comment #37)
> Just some minor stuff.
> 
> ::: dom/ipc/ProcessPriorityManager.cpp
> @@ +43,5 @@
> >  // priority manager.
> >  //
> >  // (Wow, our logging story is a huge mess.)
> >  
> > +#define ENABLE_LOGGING 1
> 
> Was this left on accidentally?

Yes, I used it for testing.

> nit: shouldn't there be an else here? (I'm assuming that it isn't possible
> to add and remove a process from the background pool)

Yes, I'll add both.

> nit: ProcessPriorityWithLRU is the name of a variable, so shouldn't it start
> with a lowercase letter?

Yes, I think I left the old variable name that was starting with an uppercase letter.

> Another thing that would work is to use the frexp function. The exponent
> returned is the power of 2 exponent, which is what you want for level here.

Good point, though I'm cheating a bit because instead of the 0, 1, 2, 2, 3, 3, 3, 3, ... sequence we want 0, 0, 1, 1, 2, 2, 2, 2, ... but that can be done with frexp() too.

> nit: flip logic and return early

Yes, new patch coming soon.
Updated patch with nits addressed, I'm also working on the tests which I'll attach shortly.
Attachment #8573575 - Attachment is obsolete: true
Attachment #8573575 - Flags: review?(khuey)
Attachment #8574603 - Flags: review?(khuey)
Attachment #8574603 - Flags: review?(dhylands)
Whoops, that was the wrong patch, this is the right one.
Attachment #8574603 - Attachment is obsolete: true
Attachment #8574603 - Flags: review?(khuey)
Attachment #8574603 - Flags: review?(dhylands)
Attachment #8574711 - Flags: review?(khuey)
Attachment #8574711 - Flags: review?(dhylands)
Comment on attachment 8574711 [details] [diff] [review]
[PATCH v2] Adjust oom_score_adj values for foreground processes according to an LRU policy

Review of attachment 8574711 [details] [diff] [review]:
-----------------------------------------------------------------

Looks good to me.
Attachment #8574711 - Flags: review?(dhylands) → review+
Thanks for the reviews! I'm carrying over the r+ flag but replacing the patch because I just realized that the bug number in the comment was wrong. I'll attach the new mochitest too.
Attachment #8574711 - Attachment is obsolete: true
Attachment #8575540 - Flags: review+
And here are the tests. I've added one test that checks LRU adjustments for foreground processes and modified the test for background processes to check that the LRU adjustments not only increase but also decrease when processes are removed from the queue - something we did not do before.
Attachment #8575562 - Flags: review?(khuey)
I had pushed the wrong queue, here's the correct try run: https://treeherder.mozilla.org/#/jobs?repo=try&revision=03b658813ecc
I had attached the wrong patch again, this is the correct one and the one that went through the try run. Carrying over the r+ flag.
Attachment #8575562 - Attachment is obsolete: true
Attachment #8576040 - Flags: review+
Time to land, try is green except for MacOS 10.6 that is systematically failing due to a harness problem. I'm not sure what's wrong with it but from the logs it looks genuinely like a harness issue; correct me if I'm wrong.

Note for the sheriffs, attachment 8575540 [details] [diff] [review] is r=dhylands,khuey and attachment 8576040 [details] [diff] [review] is r=khuey. I apologize for the spam but I kept putting the wrong number in the patch comment or just attaching the wrong patch.
Keywords: checkin-needed
Gabriele is going to run another Try push with B2G mochitests and then re-request checkin.
Keywords: checkin-needed
The B2G mochitests try run is here: https://treeherder.mozilla.org/#/jobs?repo=try&revision=e2f85b7ea503
No longer blocks: CAF-v3.0-FL-metabug
The additional try run in comment 49 is green, we should be safe to land now.
Keywords: checkin-needed
Maybe jumping the gun a little here, but attachment 8575540 [details] [diff] [review] doesn't apply to v2.2 cleanly.  :gsvelto, can you please put up a v2.2 patch as well?  We can bring it in here in parallel to the mc landing/uplift process.
Flags: needinfo?(gsvelto)
You probably need the patch for bug 1119277 which has not been uplifted yet. Can you try it out? If with that patch it applies cleanly then I'll ask for approval for that instead of making a custom patch for v2.2. Bug 1119277 removed some unused functionality so it's a lot safer to take that patch than to tweak the one here.
Flags: needinfo?(gsvelto)
Scratch my previous comment, it seems that bug 1119277 also doesn't apply cleanly to v2.2. I'll prepare specific patches for all the bugs involved.
[Approval Request Comment]
Bug caused by (feature/regressing bug #): Long standing bug
User impact if declined: Activity chains on low memory devices can have the wrong activity being killed by the LMK when running low on memory
Testing completed: Tested on a device, covered with mochitests and a try run is here https://treeherder.mozilla.org/#/jobs?repo=try&revision=3fefaff61106
Risk to taking this patch (and alternatives if risky): This patch affects how the LMK adjustment scores for background and foreground processes are calculated; this can lead to the wrong apps being killed in low memory scenarios in case the patch doesn't work as it should
String or UUID changes made by this patch: -
Attachment #8576924 - Flags: approval-mozilla-b2g37?
See comment 55.
Attachment #8576926 - Flags: approval-mozilla-b2g37?
https://hg.mozilla.org/mozilla-central/rev/908eaacff6f4
https://hg.mozilla.org/mozilla-central/rev/a30c985bc8dd
Status: ASSIGNED → RESOLVED
Closed: 10 years ago
Resolution: --- → FIXED
(In reply to Gabriele Svelto [:gsvelto] from comment #54)
> Scratch my previous comment, it seems that bug 1119277 also doesn't apply
> cleanly to v2.2. I'll prepare specific patches for all the bugs involved.

Thanks!
Hi Gabriele,

So is it fine to send any activity opener to background for now? I read comment 36 but I am not sure it's safe to do this right now; it seems to me you are doing something to all "foreground" process but I am afraid this won't apply if I start to send the opener to background.
Flags: needinfo?(gsvelto)
(In reply to Alive Kuo [:alive][NEEDINFO!] from comment #59)
> So is it fine to send any activity opener to background for now?

Not yet, I still rely on the activity opener to be visible to assign it the right priority. This patch is only a stop-gap measure so that we have a fix for v2.2. For v3 I still want to change this so that most of the priority decisions will be taken by the system app in order to remove the workarounds we have in place in the process priority manager.

> I read
> comment 36 but I am not sure it's safe to do this right now; it seems to me
> you are doing something to all "foreground" process but I am afraid this
> won't apply if I start to send the opener to background.

Yes, what's happening now is that if there's more than one foreground (i.e. visible) process their LMK score is adjusted in LRU order starting from the third one. So for example, if we have a simple two-app activity chain A (opener) -> B (activity) then both process will have the same score. If we have three A (opener) -> B (activity/opener) -> C (activity) then the older one (A) will get a lower priority and so on for longer chains.
Flags: needinfo?(gsvelto)
Attachment #8576924 - Flags: approval-mozilla-b2g37? → approval-mozilla-b2g37+
Attachment #8576926 - Flags: approval-mozilla-b2g37? → approval-mozilla-b2g37+
Depends on: 1119277
Blocks: 852925
(In reply to Gabriele Svelto [:gsvelto] from comment #60)
> (In reply to Alive Kuo [:alive][NEEDINFO!] from comment #59)
> > So is it fine to send any activity opener to background for now?
> 
> Not yet, I still rely on the activity opener to be visible to assign it the
> right priority. This patch is only a stop-gap measure so that we have a fix
> for v2.2. For v3 I still want to change this so that most of the priority
> decisions will be taken by the system app in order to remove the workarounds
> we have in place in the process priority manager.
> 
> > I read
> > comment 36 but I am not sure it's safe to do this right now; it seems to me
> > you are doing something to all "foreground" process but I am afraid this
> > won't apply if I start to send the opener to background.
> 
> Yes, what's happening now is that if there's more than one foreground (i.e.
> visible) process their LMK score is adjusted in LRU order starting from the
> third one. So for example, if we have a simple two-app activity chain A
> (opener) -> B (activity) then both process will have the same score. If we
> have three A (opener) -> B (activity/opener) -> C (activity) then the older
> one (A) will get a lower priority and so on for longer chains.

Thanks for explaining.
That is to say, we need a new bug for v3, and this one does not block bug 1102675.
No longer blocks: 1102675
(In reply to Alive Kuo [:alive][NEEDINFO!] from comment #61)
> Thanks for explaining.
> That is to say, we need a new bug for v3, and this one does not block bug
> 1102675.

Yes, absolutely. In fact it would be even better if we could talk a bit more about how to move this forward once the 2.2 release is behind us.
Keywords: checkin-needed
Blocks: 1144132
Alive, would you mind update the information here so we know that to ask next? Thanks!

https://wiki.mozilla.org/User:Timdream/Blocking_Features#Activity_OOM_priority:_bug_892371
Flags: needinfo?(alive)
Updated
Flags: needinfo?(alive)
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: