Closed Bug 1081871 Opened 10 years ago Closed 10 years ago

Evaluate using cgroups instead of nice values for prioritizing processes

Categories

(Core :: Hardware Abstraction Layer (HAL), defect)

ARM
Gonk (Firefox OS)
defect
Not set
normal

Tracking

()

VERIFIED FIXED
mozilla38
feature-b2g 2.2+
Tracking Status
firefox36 --- wontfix
firefox37 --- wontfix
firefox38 --- fixed
b2g-v2.2 --- fixed
b2g-master --- fixed

People

(Reporter: gsvelto, Assigned: gsvelto)

References

Details

Attachments

(2 files, 10 obsolete files)

(deleted), patch
gsvelto
: review+
Details | Diff | Splinter Review
(deleted), patch
gsvelto
: review+
Details | Diff | Splinter Review
+++ This bug was initially created as a clone of Bug #1061969 +++ In ye good ole days of FxOS 1.0 in bug 861441 we were trying to fix an issue we found when using setpriority() to adjust a process' priority: the syscall would adjust only the process' main thread priority but leaves others unaffected. The most robust way to solve that particular problem was to use control groups (cgroups) which allow allocating CPU usage using a set of weights associated with every group and moving processes atomically between those. We even had a WIP patch for that but unfortunately the ICS kernels we were using at the time shipped with cgroups support but it turned out to be comically broken (if you had groups A and B moving a process to group A didn't have any effect, then moving it to group B would actually made it behave as if it was in group A and so on if you moved it back). Now that we're solidly (so to speak) on KK kernel we might want to try them again and eventually replace the nice-based priority adjustment code if it works.
Gabriele, Are you investigating this? If not, Dave could possibly take this up once we know the relative priority of this over other things planned in 2.2
Flags: needinfo?(gsvelto)
(In reply to Hema Koka [:hema] from comment #1) > If not, Dave could possibly take this up once we know the relative priority > of this over other things planned in 2.2 Yes, I was planning on resurrecting my old patch from bug 861441 with the needed modifications and testing it. However since I'm working on 2.0 and 2.1 dialer blockers right now this will have to wait a few days before I can free enough time to work on it.
Flags: needinfo?(gsvelto)
Thanks Gabriele!
Blocks: 994518
Quick note so that I don't forget. Once this lands we should update the documentation here: https://developer.mozilla.org/en-US/Firefox_OS/Platform/Architecture#Processes_and_threads
I've got a patch almost ready for this but I've stumbled upon a problem I hadn't anticipated. Android is already using cgroups for prioritization with a simple policy: it splits application between system stuff, regular apps and background apps. Then it creates a simple cgroup tree that allocates respectively 50, 45 and 5% of the CPU time to those groups of apps. This isn't an issue in and by itself but it somewhat becomes one because Android also deals with threads using those same groups: The set_sched_policy() function, part of the libcutils library, puts threads in specific cgroups according to their priority. This function seems to be used by a number of Android libraries some of which we use ourselves. If we create our cgroup hierarchy and then start assigning tasks to it we'll be moving those threads around but it's hard to guarantee that one of the libraries won't move them again outside of our control. Solving this issue is non-trivial because in some cases those assignments are done for good reason (see bug 962932 for example). One way to solve this issue would be to piggy-back on Android's existing cgroup hierarchy and tweak it for our purposes but it's not a pretty fix. I'll have to think more about this before resuming the coding.
After taking some time to come up with a proper design for this I've decided to try the following approach: - We'll have a control group for every priority level that currently has a specific nice value assigned to; this includes the foreground high, foreground, background perceivable and background priority levels. - We'll piggy back on existing Android control groups to play nice with existing libraries (as well as gonksched). This isn't a big issue since Android only distinguishes between foreground and background applications allowing us to integrate the existing cgroups in our model. - All control groups will be created and initialized at Gecko's startup. Some or all groups might already exist (they are created in init scripts) but this will guarantee that they're always present. - The implementation of the HAL SetProcessPriortiy() function currently takes two parameters: one for general priority (aPriority) and one for CPU priority only (aCPUPriority). The latter takes only two values (high and low) and was introduced to cope with the scenario were we had a single, high priority task and we needed to lower the CPU priority of everything else including other foreground applications. The concrete scenario was receiving an incoming call while browsing on a very heavyweight page. Just boosting the priority of the critical process wasn't enough to make it fast enough (as we could decrease its nice value by only one point) and switching the existing foreground process into the background took too long. With control groups we won't have this problem anymore as we can create a high-priority control group which will immediately boost the priority of a process above all others so I plan on removing this parameter in a follow up; this will cut quite a bit of complexity out of the process priority manager. Finally here's how the control group hierarchy will look like: * /dev/cpuctl - top-level control group, cpu.shares set to 1024 as per default, system daemons and the main b2g process (i.e. MASTER priority) will live inside of it. * /dev/cpuctl/apps - group for applications with FOREGROUND priority, cpu.shares set to 1024 so that we balance CPU usage evenly between system tasks and application functionality. This is one of Android's predefined groups. * /dev/cpuctl/apps/critical - group for applications with FOREGROUND_HIGH priority, cpu.shares set to 16384 so that when one app is in the group it will get 16 times more CPU time than regular foreground apps. * /dev/cpuctl/apps/bg_non_interactive - group for applications with BACKGROUND priority, cpu.shares set to 52 so that apps in this group will get ~20 times less CPU than foreground apps. This is one of Android's predefined groups. * /dev/cpuctl/apps/bg_perceivable - group for applications with BACKGROUND_PERCEIVABLE priority, cpu.shares set to 103 so that apps in this group will get ~10 times less CPU than foreground apps and two times as much as regular background apps. This will need some testing and the final values might require further fine tuning. Last but not least I've yet to decide how to set the cpu.notify_on_migrate flag in the various control groups. When this flag is set to 1 for a group, whenever a thread belonging to that group is scheduled to run on a different CPU core than the last one it was on, then the CPU scaling governor is notified to automatically boost the clock of that CPU. This flag is set to 1 by default in the apps control group by Android. The idea is to ensure that interactive applications won't be slowed down when migrating between cores; this is probably very relevant for quad-core based devices. My idea is to have this flag set in the apps group obviously, but also in the apps/high one. However I'm wondering how to deal with the main b2g process. The main process would probably benefit from it but in my current design it's lumped together with other non-interactive system services in the top-level group which doesn't have this flag set. I might consider creating a separate group just for the main process to address this case but I want to do some testing first before making our hierarchy more complex.
This is just a small cleanup patch that gets rid of a number of annoying warnings I was seeing while building the code. It also removes a dead function and cleans up a little coding style issues. This is not strictly required for this bug but it's nice to have.
Assignee: nobody → gsvelto
Status: NEW → ASSIGNED
This is the actual patch. It's incomplete as I've put together the necessary infrastructure to set up and create cgroups; this is more to get a general feeling of how this will look like. The patch has received minimal testing at this point; it builds and the phone boots but the groups are not created nor any process is assigned to them yet.
Attachment #8539287 - Flags: feedback?(dhylands)
Adding the feature flag here as this falls under the 2.2 scope per https://bugzilla.mozilla.org/show_bug.cgi?id=1082290#c2. bug 1082290 is in 2.2.
feature-b2g: --- → 2.2+
Comment on attachment 8539287 [details] [diff] [review] [PATCH 2/2 WIP] Use cgroups instead of nice values to implement application priorities This looks like it's a copy of the first patch?
(In reply to Jed Davis [:jld] from comment #10) > This looks like it's a copy of the first patch? Yes, I loaded the wrong one :-/ Updated real patch coming soon...
This patch implements all the functionality I described in comment 6 but could still use some polishing. First of all what works: - All priority levels now have cgroup prefs instead of nice ones, for each cgroup we create we set both the cpu.shares parameter - to assign the appropriate chunk of CPU time to the group members - as well as the cpu.notify_on_migrate parameter which is used for interactive applications to ensure that when a thread is moved between cores, the receiving core frequency is immediately increased to match the previous one. We never had this feature and it's quite important on multi-core devices. - I've encapsulated each priority class into an object which sets its parameters according to the prefs and ensures that the associated group is properly initialized. - Finally I've modified SetProcessPriority() to pick the correct priority class and use its parameters to move the process to the appropriate cgroup. I've done a quick test on my Flame using the 18D base image and the patch seems to work correctly. I still have to investigate a couple of warnings though. There's also a couple of things that need to be polished: - Currently to assign a process to a cgroup we open its cgroup.procs file, write the PID inside of it and then close it again. Opening and closing the file every time is useless; we can open it once when we create the priority class and then just write() to it when we need to assign a process. - Due to the change above I'd like to move the actual implementation of SetProcessPriority() inside of the priority class objects; this provides better encapsulation and is cleaner. - Last but not least I'd like to split the process priority & LMK management functionality into a separate file. GonkHal.cpp has grown enormously and it's not easy to manage. I'd do this in a third patch though, to avoid complicating the review of this patch.
Attachment #8539287 - Attachment is obsolete: true
Attachment #8539287 - Flags: feedback?(dhylands)
Attachment #8542267 - Flags: feedback?(dhylands)
While testing the previous version of the patch I've located and addressed a couple of problems: - The FOREGROUND_KEYBOARD group didn't have its cgroup assigned, this was fixed - The KillUnderKB parameter wasn't being read properly from the prefs, this essentialy broke LMK functionality. Fortunately the kernel was complaining loud enough that I spotted the issue quickly. Additionally I've moved handling the PID assignment to a cgroup into the PriorityClass object and made it cache the cgroup.procs file descriptor. This makes adjusting the priority of a process require a single syscall (write()ing the PID into the cgroup.procs fd); this is a marked improvement over the nice-based system which required 10s of them (and was racy on top).
Attachment #8542267 - Attachment is obsolete: true
Attachment #8542267 - Flags: feedback?(dhylands)
Attachment #8542993 - Flags: review?(dhylands)
Refreshed patch.
Attachment #8542263 - Attachment is obsolete: true
A few more changes including a fix for setting the properties of the top-level control-group and a couple of minor cleanups.
Attachment #8542993 - Attachment is obsolete: true
Attachment #8542993 - Flags: review?(dhylands)
Attachment #8543392 - Flags: review?(dhylands)
Last revision, I forgot adding an overloaded assignment operator for the ProcessPriority class, just in case.
Attachment #8543393 - Attachment is obsolete: true
Attachment #8543407 - Flags: review?(dhylands)
Blocks: 1082290
Comment on attachment 8543392 [details] [diff] [review] [PATCH 1/2 v2] Remove dead code, quiet some warnings and do some minor code style cleanups Review of attachment 8543392 [details] [diff] [review]: ----------------------------------------------------------------- Looks good.
Attachment #8543392 - Flags: review?(dhylands) → review+
Comment on attachment 8543407 [details] [diff] [review] [PATCH 2/2 v3] Use cgroups instead of nice values to implement application priorities Review of attachment 8543407 [details] [diff] [review]: ----------------------------------------------------------------- Excellent. Looks great.
Attachment #8543407 - Flags: review?(dhylands) → review+
Thanks for the review Dave! I'm rebasing the patches since I've been bit-rotted by bug 1073003 and I'll post them to try right away.
Rebased patch, carrying over the r=dhylands flag.
Attachment #8543392 - Attachment is obsolete: true
Attachment #8545933 - Flags: review+
Rebased patch, carrying over the r=dhylands flag.
Attachment #8543407 - Attachment is obsolete: true
Attachment #8545934 - Flags: review+
The try run is here, keeping my fingers crossed: https://treeherder.mozilla.org/ui/#/jobs?repo=try&revision=e721d799fad6
Don't we still need to keep the old way around? If we know that 2.2 will only run on CGROUP enabled kernels, then we're fine. But if we run on a kernel with CGROUP_SCHED=n then I think we need to fallback to using nice values.
Blocks: 1119277
(In reply to Dave Hylands [:dhylands]{PTO until Jan 5) from comment #24) > Don't we still need to keep the old way around? > > If we know that 2.2 will only run on CGROUP enabled kernels, then we're fine. All kernels I've encountered on commercial devices as well as prototypes - starting with the ICS-based ones - have cgroup support enabled. The only ones that don't are the Keon and Peak where it seems to me that the kernel was deliberately modified by GP to remove cgroup support since we weren't using it. I said it was done deliberately because the init scripts on both phones still try to mount the cgroup pseudo-filesystem and populate it. I was able to rebuild the kernel on both phones with cgroup support and it works on both so I think we can ask GP to address that (it's a matter of partially reverting their commit with hash d6e55d7bb97b5bb686edb5a51cf1de7933535b75). However there's a bigger problem we'll be facing: the ICS kernel has a bizarre bug when working with the cgroup.procs file. If you write a PID into it, the process will be migrated into the group (the group's tasks file will contain all the process' TIDs, etc...) but in practice the scheduling will not change. To make matters worse, when moving the process to another group the scheduling policy of the previous one will be enforced. This can be reproduced easily on a hamachi for example, in an adb shell you might try the following: (while : ; do : ; done) & pid1=$! (while : ; do : ; done) & pid2=$! echo $pid1 > /dev/cpuctl/bg_non_interactive/cgroup.procs Check CPU allocation at this stage, both process will be getting the same amount of CPU which is wrong, the $pid1 process should get only ~5% since it's been moved to the bg_non_interactive group. Then do the following: echo $pid1 > /dev/cpuctl/cgroup.procs Now check again, the $pid1 process will now be allocated ~5% of the CPU as if it were in the bg_non_interactive group. There are two possible workarounds for this: manually moving all of a process' TIDs into the group tasks file (which works, but is racy and kludgy) or for every group transition moving a process into the desired group and then back into the top-level one (which is also kludgy, but not racy). I wanted to address this in a follow-up though because IIRC there won't be any more ICS-based devices in the future and thus I don't think it's worth keeping the nice-based logic around. Additionally removing the nice-based logic entirely allows us to greatly simplify the process priority manager, see bug 1119277 which I've just filed as a follow-up.
Could we create parallel dummy cgroups which have the same attribute as the intended cgroup, move the process to the dummy cgrou and then move it to the intended one?
(In reply to Dave Hylands [:dhylands] from comment #26) > Could we create parallel dummy cgroups which have the same attribute as the > intended cgroup, move the process to the dummy cgrou and then move it to the > intended one? Yes, that would work and it would be less hacky than my idea of moving a PID into a group and then back in the default one. It would still work in the case we'd encounter an ICS device with a working cgroup implementation. I'll open a follow-up for it.
Blocks: 1119467
Try is green and I've shot a warning message to dev-b2g and dev-gaia to inform everybody about the changes here. Let's hope we don't hit any regressions, but if we do we'll back out mercilessly. Time to land.
Keywords: checkin-needed
Blocks: 1119567
Blocks: 1119569
Status: ASSIGNED → RESOLVED
Closed: 10 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla37
Depends on: 1120166
No longer depends on: 1120166
This needs to be backed out because of bug 1122119. Bug 1082290 also needs to be backed out as it depends on this one and it's impossible to remove one but not the other patch.
Keywords: checkin-needed
After this patch backed out, these two bugs CANNOT be reproduced: Bug 1120889 - [Flame][Bluetooth] Device pairing fails while we share audio via Bluetooth in Music. Bug 1122177 - [Bluetooth] Attempting to pair while device is searching for other devices will result in endless pairing
So, after giving this some more thought I've decided to update my patch by keeping both the old nice-based priority and the new cgroup approach and using the same overall code structure I had before so we can still land bug 1082290 on top of it. The cgroup support will be pref'd off by default; this will give us the ability to try it out in the future and maybe enable it selectively on devices where we can prove it works fine.
Johan, since you could reproduce this easily could you try the following for me? Make a build with both attachment 8545933 [details] [diff] [review] and attachment 8545934 [details] [diff] [review] applied and this patch on top. I'm trying to narrow down what's causing the problem and I'm now suspecting the cpu.notify_on_migrate parameter. It's a useful feature that informs the kernel to raise the frequence of a core if a thread is migrated on it from another core already running at an higher frequency; but it's also Android-specific and not in the mainstream Linux kernel so I suspect it might not be well tested. This patch disables it for all groups so if you can't reproduce the issue with it applied then we'd have narrowed it down the crashes to that parameter.
Flags: needinfo?(jlorenzo)
Based on Gabriele's and Johan's observations, specifically in bug 1122119 comment 74, I did a tentative of running the very same tests than johann but with notify_on_migrate disabled. Prior to this, I could reproduce a crash after 25 minutes. So far, my Flame just crossed 40 mins without any crash. I'll continue investigating this today, since there are two prefs to control notify_on_migrate. In the current run that is still stable after 40 mins, both were disabled.
My Flame is now not exposing the issue for more than 60 mins. I have been lurking a little bit on CAF's kernel side, and I found those two latest commits being interesting, around kernel/sched/core.c: https://www.codeaurora.org/cgit/quic/la/kernel/msm/log/kernel/sched/core.c?h=LA.BF.1.1.1.c2 I'm currently rebuilding to investigate how those may be fixing the issue.
My flame crashed after 35 mins :(
First of all I'd like to thank Alexandre for doing a lot of testing yesterday (sunday!) on the cpu.notify_on_migrate parameter and confirming that it's indeed what causing the kernel panics. He also identified which upstream patches might help us here (see comment 37). Turns out that my hunch in comment 35 was right and the kernel code handling that parameter (cpufreq adjustments in response to thread migrations) is horribly buggy. As per comment 37 and comment 38 there's numerous patches to fix those bugs available but none of those seem to solve the problem entirely. This patch turns off the cpu.notify_on_migrate option for all groups. Not only this prevents the bug from happening with the other cgroup changes applied but it's also likely to increase the stability of our Flames. That's because the init scripts were already setting /dev/cpuctl/apps/cpu.notify_on_migrate to 1 and while that group was only used by a handful of threads spawned by the base system it might have been enough to cause issues. With this patch applied that will be taken care off. Since the pref is still present vendors can turn it on if they feel confident it will work on the kernel they ship.
Attachment #8545934 - Attachment is obsolete: true
Attachment #8557132 - Attachment is obsolete: true
Flags: needinfo?(jlorenzo)
Attachment #8557947 - Flags: review+
Awesome
I've got a whitescreen on one of the lab phones today, similar to the previous effect, running b2g-inbound smoketests. 13:26:46 Showing version details 13:26:52 0:02.45 LOG: MainThread mozversion INFO application_buildid: 20150203111724 13:26:52 0:02.45 LOG: MainThread mozversion INFO application_changeset: 166db45bb67f 13:26:52 0:02.45 LOG: MainThread mozversion INFO application_display_name: B2G 13:26:52 0:02.45 LOG: MainThread mozversion INFO application_id: {3c2e2abc-06d4-11e1-ac3b-374f68613e61} 13:26:52 0:02.45 LOG: MainThread mozversion INFO application_name: B2G 13:26:52 0:02.45 LOG: MainThread mozversion INFO application_remotingname: b2g 13:26:52 0:02.45 LOG: MainThread mozversion INFO application_repository: https://hg.mozilla.org/integration/b2g-inbound 13:26:52 0:02.45 LOG: MainThread mozversion INFO application_vendor: Mozilla 13:26:52 0:02.45 LOG: MainThread mozversion INFO application_version: 38.0a1 13:26:52 0:02.45 LOG: MainThread mozversion INFO build_changeset: e06971db7acf7a35c32eb74d675a4e12e288e6be 13:26:52 0:02.45 LOG: MainThread mozversion INFO device_firmware_date: 1422994100 13:26:52 0:02.46 LOG: MainThread mozversion INFO device_firmware_version_base: L1TC100118D0 13:26:52 0:02.46 LOG: MainThread mozversion INFO device_firmware_version_incremental: eng.cltbld.20150203.150810 13:26:52 0:02.46 LOG: MainThread mozversion INFO device_firmware_version_release: 4.4.2 13:26:52 0:02.46 LOG: MainThread mozversion INFO device_id: flame 13:26:52 0:02.46 LOG: MainThread mozversion INFO gaia_changeset: dfebaaa8aab43470f482d09d71137bab840c3ae9 13:26:52 0:02.46 LOG: MainThread mozversion INFO gaia_date: 1422989380 13:26:52 0:02.46 LOG: MainThread mozversion INFO platform_buildid: 20150203111724 13:26:52 0:02.46 LOG: MainThread mozversion INFO platform_changeset: 166db45bb67f 13:26:52 0:02.46 LOG: MainThread mozversion INFO platform_repository: https://hg.mozilla.org/integration/b2g-inbound 13:26:52 0:02.46 LOG: MainThread mozversion INFO platform_version: 38.0a1 I'm looking through logs to understand what's landed, but just as a sanity check, any chance it's related?
Flags: needinfo?(gsvelto)
(to be clear, I can tell that it looks like this update hasn't landed, but I just want to make sure I'm not misreading logs.)
(In reply to Geo Mealer [:geo] from comment #43) > (to be clear, I can tell that it looks like this update hasn't landed, but I > just want to make sure I'm not misreading logs.) I haven't relanded the patch yet but the functionality that was causing the kernel panic is enabled even without my previous patch applied so it might be related. It might also be due to another problem, we've seen kernel panics related to silk too and the white screen is just a symptom of a kernel panic. Until bug 1025265 lands we are unable to tell what's the exact cause of a panic. The new patch here turns off the problematic functionality for good and should hopefully shield us from further problems related to it. Note that the white screen condition also seems to be a
Flags: needinfo?(gsvelto)
On this topic, the try run was green. Let's try to land this again and see if it sticks.
Keywords: checkin-needed
Status: REOPENED → RESOLVED
Closed: 10 years ago10 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla38
No longer blocks: 1119567
Comment on attachment 8557947 [details] [diff] [review] [PATCH 2/2 v5] Use cgroups instead of nice values to implement application priorities NOTE: Please see https://wiki.mozilla.org/Release_Management/B2G_Landing to better understand the B2G approval process and landings. [Approval Request Comment] Bug caused by (feature/regressing bug #): New feature scheduled for inclusion in 2.2 User impact if declined: Application priority management will be slightly less effective, this won't be visible to a user except in specific situations such as when the phone is under a very heavy load. Not including this will also prevent us from landing further related improvements in 2.2 such as bug 1082290 Testing completed: This was manually tested on multiple devices and is now running as part of nightly. The issues described in bug 1122119 haven't resurfaced with the new iteration of this patch. Risk to taking this patch (and alternatives if risky): This patch heaveily exercises some kernel features we haven't used before. As bug 1122119 shows this involve a measure of risk of encountering kernel bugs. We've reduced the scope of this patch to only use features that have been in the Linux kernel for a long time and disabled all the more experimental bits. Those can be re-enabled by a vendor if they're willing to take more risks. String or UUID changes made by this patch: None
Attachment #8557947 - Flags: approval-mozilla-b2g37?
Comment on attachment 8545933 [details] [diff] [review] [PATCH 1/2 v3] Remove dead code and do some minor code style cleanups See comment 48 for the approval request, this only contains cleanups on which the following patch was built.
Attachment #8545933 - Flags: approval-mozilla-b2g37?
Blocks: 1131582
Comment on attachment 8545933 [details] [diff] [review] [PATCH 1/2 v3] Remove dead code and do some minor code style cleanups Approving this as this is a committed feature for 2.2. In terms of mitigating risks, NI :walter and Paul who will be running the MTBF testing and may get into bugs that be related to this landing (kernel panic, white screen, crahes etc). If you need any additional help form QA to test this on device please flag me.
Attachment #8545933 - Flags: approval-mozilla-b2g37? → approval-mozilla-b2g37+
Flags: needinfo?(ypwalter)
Flags: needinfo?(pyang)
Attachment #8557947 - Flags: approval-mozilla-b2g37? → approval-mozilla-b2g37+
Flags: needinfo?(ypwalter) → needinfo?(wachen)
Is it patched in v2.2? we need v2.2 for MTBF testing.
Will start once it lands.
Flags: needinfo?(pyang)
not seeing this for a long time. Gaia-Rev 389542b71c89253c0d176d3b0bfb54e275c19bf1 Gecko-Rev https://hg.mozilla.org/releases/mozilla-b2g37_v2_2/rev/81bcdde1e783 Build-ID 20150223162505 Version 37.0 Device-Name flame FW-Release 4.4.2 FW-Incremental eng.cltbld.20150223.202034 FW-Date Mon Feb 23 20:20:45 EST 2015 Bootloader L1TC100118D0
Status: RESOLVED → VERIFIED
Flags: needinfo?(wachen)
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: