Closed Bug 861441 Opened 12 years ago Closed 12 years ago

GonkHal::SetPriority does not set CPU priorities properly

Categories

(Core :: Hardware Abstraction Layer (HAL), defect)

ARM
Gonk (Firefox OS)
defect
Not set
normal

Tracking

()

RESOLVED FIXED
1.0.1 Cert2 (21may)
blocking-b2g tef+
Tracking Status
firefox21 --- wontfix
firefox22 --- wontfix
firefox23 --- fixed
b2g18 --- fixed
b2g18-v1.0.0 --- wontfix
b2g18-v1.0.1 --- fixed

People

(Reporter: justin.lebar+bug, Assigned: gsvelto)

References

Details

(Whiteboard: [madrid])

Attachments

(4 files, 7 obsolete files)

(deleted), patch
Details | Diff | Splinter Review
(deleted), patch
glandium
: review+
Details | Diff | Splinter Review
(deleted), patch
Details | Diff | Splinter Review
(deleted), patch
gsvelto
: review+
Details | Diff | Splinter Review
We discovered in bug 860526 that the way GonkHal::SetPriority sets CPU priorities is busted. GonkHal::SetPriority tries to set the nice value of a given process by doing setpriority(pid, nice), but in fact setpriority(pid) affects only the process's main thread. If you want to set the priority of all of the process's threads, you have to call setpriority() once for each of the process's threads. See http://jlebar.com/2013/4/11/Beware_of_renice.html for more details on this problem. I've experimented with renicing all threads (and I'll post a patch), but even still my whole-system profiler (bug 860273) shows reniced child processes getting a lot of CPU time when we receive a call under process-buster stress (http://bit.ly/membuster). It's possible that everything is working properly and the child processes are not slowing the system down. It's also possible that the relatively large number of low-priority threads are sucking up enough CPU to slow down receiving a call by a few seconds (13s without load, 17s with load). We may want to experiment here with alternatives to renicing all the threads. For example, attachment 736993 [details] [diff] [review] uses cgroups, which have the advantage of (hopefully) changing the priority of all the threads in one process at once. At the very least this should let us avoid the race conditions which occur when we try to renice threads individually. It may also give us more control over how tasks get scheduled. dhylands suggested we may want to look at assigning realtime priorities to these high-priority processes. That might be sensible, but I'm afraid we may still need to renice each of the threads individually, with all the problems that implies.
blocking-b2g: --- → tef+
Hi! Alan, Need your help on this case. -- Keven
Assignee: nobody → ahuang
Hi, Keven. Gabriele and I are working on this. We've been discussing it in great depth in other bugs. I don't think we need help at the moment. I'm going to take this and Gabriele may steal it from me.
Assignee: ahuang → justin.lebar+bug
Yup, I'm cooking up a variation of attachment 736993 [details] [diff] [review] with a third cgroup dedicated to the PROCESS_PRIORITY_FOREGROUND_HIGH priority level. With this change if all three groups (foreground high, foreground and background) contain processes running at full steam the CPU usage should be split as following: PROCESS_PRIORITY_BACKGROUND* 10% PROCESS_PRIORITY_FOREGROUND 40% PROCESS_PRIORITY_FOREGROUND_HIGH 50% I'm still unsure where to put the PROCESS_PRIORITY_BACKGROUND_PERCEIVABLE processes. I'm under the impression that we want to dedicated some time to them above the other background tasks so maybe a better split would be to give it its own cgroup. As a side note Android cgroups are created as part of the init.rc scripts, probably because they're needed before various services start executing. In our case I think it might be an option to create them when starting the b2g main process (if they don't already exist) since that's where they will be used from. This has a couple of nice advantages: - they'd live in the same piece of code that uses them so it's easier to keep them in sync - we can set the cpu shares for every group using preferences from b2g.js which allows for simple tweaking - we don't need to add more stuff to the init.b2g.rc script in gonk-misc
Hello Gabriele, I am looking at attachment 736993 [details] [diff] [review]. If we want to write new scheduling policy to task, we could call API androidSetThreadSchedulingGroup() to set a task's cgroup. Something like the part in attachment 682352 [details] [diff] [review].
(In reply to Alan Huang [:ahuang] from comment #5) > I am looking at attachment 736993 [details] [diff] [review]. If we want to > write new scheduling policy to task, we could call API > androidSetThreadSchedulingGroup() to set a task's cgroup. Something like the > part in attachment 682352 [details] [diff] [review]. From what I can tell androidSetThreadSchedulingGroup() allows using only the pre-existing control groups provided by Android (fg_boost, bg_non_interactive). When discussing the issue with Justin we came to the conclusion that this is probably not enough to fix the problem we're having (see bug 860526 comment 23) so we need more control groups to be available and that would rule out the use of androidSetThreadSchedulingGroup().
(In reply to Justin Lebar [:jlebar] from comment #2) > Hi, Keven. > > Gabriele and I are working on this. We've been discussing it in great depth > in other bugs. I don't think we need help at the moment. > > I'm going to take this and Gabriele may steal it from me. Hi! Justin, I am just thinking Alan can help and can more involved. Maybe we can meet each other in Madrid. :) -- Keven
Quick update: the good news is that after studying the topic a little bit more I figured that control groups can be created hierarchically. This simplifies implementing our priority classes, for example (warning - ugly ASCII art ahead): * | +- background (10% of total CPU time) | | | +--- perceivable (50% of background) | | | +--- homescreen (?) | +- foreground (90% of total CPU time) | +--- high (75% of foreground) The bad news is that for some reasons neither my custom generated control groups nor the default Android ones seem to be currently enforced. Populating every group with a busy process leads to even CPU usage among all processes. There's something that must be amiss in the kernel configuration and I hope to be able to fix it quickly in order to test this.
Hi, Kevin. Adding developers to a late software project just makes it later.
I will describe the weird behavior I'm currently seeing when using cgroups because even if I made some progress with them I'd like to ensure that they work reliably and not by pure chance. For the cgroups usage I'm referring directly to the kernel documentation: https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/tree/Documentation/cgroups/cgroups.txt?id=refs/tags/v3.8 This lines are relevant in particular: - tasks: list of tasks (by PID) attached to that cgroup. This list is not guaranteed to be sorted. Writing a thread ID into this file moves the thread into this cgroup. - cgroup.procs: list of thread group IDs in the cgroup. This list is not guaranteed to be sorted or free of duplicate TGIDs, and userspace should sort/uniquify the list if this property is required. Writing a thread group ID into this file moves all threads in that group into this cgroup. By using the standard Android cgroups I did the following testing (on both the Unagi and the Otoro): * open a shell on the device * spawn two processes both trying to consume 100% of the CPU, I used this shell command: ( while : ; do : ; done ) & * say the PIDs for both processes are 1001 & 1002, check via top that they're both using the same amount of CPU * move the first process into the bg_non_interactive cgroup echo 1001 > /dev/cpuctl/bg_non_interactive/cgroup.procs * verify that process 1001 appears in both /dev/cpuctl/bg_non_interactive/cgroup.procs and /dev/cpuctl/bg_non_interactive/tasks, top PCY field says "bg" for the process * CPU usage should be ~95% for process 1002, and ~5% for process 1001, I still get 50/50 instead * move process 1001 back into the top-level cgroup echo 1001 > /dev/cpuctl/cgroup.procs * verify as above, now the CPU split should be 50/50 instead I get 5/95 as if process 1001 was in the bg_non_interactive group Doing the same test but writing the PID in the "tasks" file instead of "cgroup.procs" works correctly, giving me the expected results of a 5/95 split when moving the first process into the bg_non_interactive group. Doing so is not a valid solution for our problem though as the kernel documentation clearly states that "tasks" is used for moving threads, not processes into a cgroup. Now, the only difference I could find between tasks and cgroup.procs is that the former has its permissions and ownership explicitly changed in init.rc on our devices: chown system system /dev/cpuctl/bg_non_interactive/tasks chmod 0777 /dev/cpuctl/bg_non_interactive/tasks I'm trying to figure out if this is causing the weird behavior I'm seeing or if cgroups manipulation works differently in the Android kernel compared to a stock Linux kernel.
I've spent the entire day trying to figure out the issue I'm seeing with cgroups and I came to these conclusions: - The behavior of the "cgroup.procs" and "tasks" pseudo-files superficially seems to match the documentation, moving a PID to cgroup.procs moves all its threads into that cgroup, moving a PID inside tasks moves only the specified thread. This can all be verified by printing out the contents of the relevant cgroups or inspecting /proc/<pid>/cgroup and /proc/<pid>/task/<tid>/cgroup. - As far as scheduling goes things diverge completely: moving PIDs into the "tasks" file works as advertised while moving PIDs inside "cgroup.procs" exhibits the weird behavior I described above. Essentially every time you move a PID inside a new cgroup the process gets the scheduling policy of the *previous* cgroup it was in. I've investigated the Android code and it seems to ever use only the "tasks" pseudo-file and never "cgroup.procs"; that's a pretty strong hint that they may have run into this issue too.
> For the cgroups usage I'm referring directly to the kernel documentation: FWIW it may be worth looking at the docs for our kernel's version, instead of v3.4. Anyway, this sounds like a bug in the kernel. Maybe there's a patch we can backport. Changing the kernel at this point isn't scary at all, right? It sounds like we have three options at this point. 1) Nice all the threads. This is racy, and it also does not improve phone call response time much over niceing only the processes, in my tests. 2) Move all the threads into a cgroup. This is also racy, but maybe it will improve call response time over nice. 3) Figure fix the apparent bug in cgroup.procs. I'm afraid that we'll do (2) and observe that the bg processes are still using a lot of CPU. This would indicate that the bg processes are using CPU because the phone is idle. And that would indicate to me that perhaps the bg processes are sending enough messages to the master process to cause it to do significant work on the child's behalf, and this is what's causing us to be slow when we receive a phone call. If that's what is going on, none of these solutions may help, at least with the membuster testcase. With the "pan and zoom around on the page while the phone is receiving a call" testcase, we have fewer processes vying for the CPU, so that may be easier to address.
(In reply to Justin Lebar [:jlebar] from comment #12) > FWIW it may be worth looking at the docs for our kernel's version, instead > of v3.4. I checked 3.0 too and the documentation had only minimal changes between versions. > Anyway, this sounds like a bug in the kernel. Maybe there's a patch we can > backport. Changing the kernel at this point isn't scary at all, right? At this stage no :) I'll go through the kernel logs & bugzilla tracker and see if it's been documented somewhere. > 2) Move all the threads into a cgroup. This is also racy, but maybe it will > improve call response time over nice. I will write a patch today doing this; if it works we might use it and open a follow-up to clean it up once the kernel bug gets fixed. I had an alternative too but it's just as sticky and kludgy (if not more). My idea was that since apparently the kernel is applying the scheduling rules of the _previous_ cgroup a process has been in, I would always transition a process to the appropriate cgroup and then back to the top-level one. It *should* do the trick but it looks even uglier than lumping all the threads in /dev/cpuctl/*/tasks in one go. > 3) Figure fix the apparent bug in cgroup.procs. Will do that anyway, but first I'll prepare the patches. > I'm afraid that we'll do (2) and observe that the bg processes are still > using a lot of CPU. This would indicate that the bg processes are using CPU > because the phone is idle. And that would indicate to me that perhaps the > bg processes are sending enough messages to the master process to cause it > to do significant work on the child's behalf, and this is what's causing us > to be slow when we receive a phone call. My hope is that since cgroups enforce a hard-cap on CPU usage once you relegate all threads to a limited share they won't be able to hammer the b2g process too much with requests leaving the communications app to do its job.
This is a tentative patch using cgroups instead of nice values. I've added four custom cgroups to map our priorities; they are currently hard-coded and created the first time SetProcessPriority() is called if they don't exist already: - background 10% total CPU time - background/perceivable 50% of the background CPU time - foreground 90% total CPU time - forground/high 75% of the foreground CPU time The b2g process and other system daemons are not accounted in these cgroup as they are in the top-level cgroup. This means that if the b2g process is running at full throttle 50% of the CPU will be allocated to it even though there are many processes in the foreground group fighting for the CPU too. The patch iterates over the TIDs of each process and moves them one by one to the destination control group using the "tasks" pseudo-file to avoid the problem I've experienced with "cgroup.procs". The code doing this is sub-optimal at the moment but I'll fix it in an updated patch. I've verified that the patch is moving PIDs correctly among the four groups and I've also verified that the CPU % caps are enforced. I will run the get-a-call-under-process-buster scenario using this patch and report my findings.
Assignee: justin.lebar+bug → gsvelto
Status: NEW → ASSIGNED
Attachment #737981 - Flags: feedback?(justin.lebar+bug)
I've tried to reproduce the issue of bug 847592 both with and without this patch applied, the steps where the following: - clock on "bust processes" from http://bit.ly/membuster - immediately after call my mobile phone (I used my landline for that) - when I can hear the first ring in the calling phone I started my stopwatch - I stopped my stopwatch as soon as I could see the answer screen slide in I repeated the test a few times for consistency; it was done on an Otoro with today's B2G/v1-train & mozilla-b2g18 sources. Without the patch applied I could time at least four seconds of delay between the two, during my tests it was usually more than four seconds (4.4-4.5), never less. With the patch applied this dropped to around two seconds (1.9-2.1). I'm not sure if this is satisfactory but it definitely looks like an improvement. I'd like to try with the patch-set in bug 844323 applied to see if it buys us a bigger improvement.
Whiteboard: [madrid]
Here's an updated patch with a lot more polish: - the control groups and their relative CPU share can be defined via preferences - values for cgroups & OOM killer preferences have proper fallbacks so we don't crash if they are missing - SetProcessPriority() doesn't read the prefs anymore every time it's run, just the first time I want to give it even more polish if possible. Right now every time we run SetProcessPriority() we're opening and closing the control groups pseudo-files a zillion times while those could be opened once during initialization and then just used.
Attachment #737981 - Attachment is obsolete: true
Attachment #737981 - Flags: feedback?(justin.lebar+bug)
Attachment #738466 - Flags: feedback?(justin.lebar+bug)
Comment on attachment 738466 [details] [diff] [review] [PATCH v1 mozilla-b2g18] Use cgroups instead of nice values when changing the priority of a process This patch applies to the mozilla-b2g18 tree.
Attachment #738466 - Attachment description: [PATCH v1] Use cgroups instead of nice values when changing the priority of a process → [PATCH v1 mozilla-b2g18] Use cgroups instead of nice values when changing the priority of a process
Same patch but for mozilla-central.
More polish: now the tasks file is opened only once after group creation so that we don't open()/close() it for every TID we add when calling SetProcessPriority(). It may be time to wrap this stuff into a class with a shutdown observer and proper cleanup...
Attachment #738466 - Attachment is obsolete: true
Attachment #738466 - Flags: feedback?(justin.lebar+bug)
Attachment #738566 - Flags: feedback?(justin.lebar+bug)
Revised patch, the functionality is fundamentally unchanged compared to the previous one but I've consolidated all the code into a class, documented it and added proper procedures to cleanup properly during shutdown.
Attachment #738566 - Attachment is obsolete: true
Attachment #738566 - Flags: feedback?(justin.lebar+bug)
Attachment #739110 - Flags: feedback?(justin.lebar+bug)
Patch that applies cleanly to the m-c tree.
Attachment #738487 - Attachment is obsolete: true
Target Milestone: --- → Madrid (19apr)
Whiteboard: [madrid] → [madrid][status: patches in progress]
I'm testing my patch with today's build but I've hit a significant regression compared to my previous testing (both with and w/o my patch applied). The times measured via the method I described in comment #15 have skyrocketed. Without my patch I've measured times over 10s and in the 5-7s range with my patch applied (under process-buster stress). I'm unsure what happened compared to two days ago but from what I can see it seems that hal::SetProcessPriority() is not being called fast enough leaving a lot of processes in the foreground when the call comes; the time-consuming foreground tasks seem to stick around for seconds before being sent into the background.
> from what I can see it seems that hal::SetProcessPriority() is not being called fast > enough Bug 844323 should improve this. I'm sort of happy to hear that it's necessary. :)
(In reply to Justin Lebar [:jlebar] from comment #23) > Bug 844323 should improve this. I'm sort of happy to hear that it's > necessary. :) I haven't looked too much into the ProcessPriorityManager but I had the feeling your patches would be necessary anyway. Do your patches apply to mozilla-b2g18? I'm eager to test them with this change and see if this problem is resolved once and for all.
Attached patch Rough patch to setpriority on all threads (obsolete) (deleted) — Splinter Review
I don't think we want to do this, but I'm attaching this patch in case we need it later for some reason.
Attachment #740284 - Attachment is obsolete: true
Since I can't find you on IRC and since we apparently must have this done by Wednesday, I'm doing a review and fixing up this patch at the same time. I'll post an interdiff for you when I'm done. The code looks pretty good to me. Ping me if you want to talk!
How do nested cgroups work, Gabriele? And how does the scheduler choose between the master process (root cgroup) and a process with limited CPU shares? Do we fall back to completely fair scheduling?
Nested cgroups work hierarchically with child groups sharing the parent's CPU share. The balancing is done at the same level by dividing a group's cpu.shares value by the sum of the cpu.shares values of all groups with active processes at the same level. Child processes share their parent CPU allocation and are balanced with it using cpu.shares as if they were at the same level as the parent. For example if A is a group with cpu.shares = 100 and A1 its child with cpu.shares = 50; the child will take 1/3 of the parent's CPU share if both a parent and child process are active (50 / 100 + 50). If you add another child, say A2, with cpu.shares = 100 then the allocation will be: A = 100 / (100 + 50 + 100) = 0,4 -> 40% CPU time A1 = 50 / (100 + 50 + 100) = 0,2 -> 20% CPU time A2 = 100 / (100 + 50 + 100) = 0,4 -> 20% CPU time The master group has its own cpu.shares which by default is 1024, all top-level groups are balanced against it so in my patch the shares are: - master 1024 - background 103 - background/perceivable 103 - foreground 921 - foregroudn/high 1381 Translates into this CPU allocation with processes active in master, background and foreground: - master 1024 / (1024 + 103 + 921) = ~50% - background 103 / (1024 + 103 + 921) = ~5% - foreground 921 / (1024 + 103 + 921) = ~45% If only background and foreground would have active processes this would be: - background 103 / (103 + 921) = ~10% - foreground 921 / (103 + 921) = ~90% Sub-groups are as follows: - background/perceivable 103 so it takes ~50% of the CPU share of background (103 / (103 + 103)) - foreground/high 1381 so it takes ~60% of the CPU share of foreground (1381 / (1381 + 921)) Actually I wanted the latter to be ~75% of foreground but I must have got my calculations wrong when I was writing the patch, cpu.shares for it should be set to 2763, not 1381.
Even with this patch I'm still seeing calls take an extra 3s under load (which is roughly what I saw with nice), and I'm seeing nominally background browser processes taking up a lot of CPU. So at least one of these must be true, I think 1. the processes aren't actually getting significantly lowered CPU priority (maybe something is wrong with our usage of cgroups or the kernel's implementation of it). 2. the communications + master processes don't have enough to do while a call is incoming to eat up 100% of CPU. But then this doesn't explain why receiving a call is slower when we're under process-buster load. 3. the bg processes are causing the master process to do work on their behalf, and this is slowing the communications app down, and then this frees up timeslots for the bg processes to run more. I think we need to differentiate between these possibilities.
jld suggested that I could modify the whole-system profiler to tell us which processes/tasks are blocked. We can read this from /proc/pid/stat (it's the third field). What I want to see is whether the communications app is blocked.
WRT nested cgroups, the kernel (b2g/kernel/sched.c) has an important example: > * In other words, if root_task_group has 10 tasks of weight > * 1024) and two child groups A0 and A1 (of weight 1024 each), > * then A0's share of the cpu resource is: > * > * A0's bandwidth = 1024 / (10*1024 + 1024 + 1024) = 8.33% I take this to mean that if we have 10 bg processes each with weight 5, and we have one bg/perceivable process with weight 20, then the bg/perceivable process will get 20 / (10 * 5 + 20) of the CPU time. That is, background processes can overwhelm bg/perceivable processes, and foreground processes can overwhelm fg/perceivable processes. If I'm understanding this correctly, this is not what we want. These nested processes seem difficult to reason about; perhaps we should avoid them. Or at the very least, it seems that the "nestee" should always be /lower/ priority than the category in which it's nested.
(In reply to Justin Lebar [:jlebar] from comment #31) > WRT nested cgroups, the kernel (b2g/kernel/sched.c) has an important example: > > > * In other words, if root_task_group has 10 tasks of weight > > * 1024) and two child groups A0 and A1 (of weight 1024 each), > > * then A0's share of the cpu resource is: > > * > > * A0's bandwidth = 1024 / (10*1024 + 1024 + 1024) = 8.33% > > I take this to mean that if we have 10 bg processes each with weight 5, and > we have one bg/perceivable process with weight 20, then the bg/perceivable > process will get > > 20 / (10 * 5 + 20) What you want here is putting the 10 bg processes in one subgroup and the bg/perceivable process in another subgroup.
I moved this function out of hal because I want to share it between two different source files. The only thing I'm not sure about here is the #ifdef MOZILLA_INTERNAL_API. I've never quite understood that bit...
Attachment #740676 - Flags: review?(mh+mozilla)
Attached patch Part 2: Use cgroups in hal::SetProcessPriority. (obsolete) (deleted) — Splinter Review
I've been developing this patch against b2g18, but the code should be basically the same on trunk. This is a similar approach to the one in the patch you posted, but I changed some things. The most major change is that I no longer cache anything (including the fd); I felt that it was a premature optimization. I also got rid of ProcessesPriorityManager; we don't need a shutdown observer, so we don't really need that class. I also moved everything into a new file, which of course makes diffs hard. :) I'm really sorry to have stolen your patch; I wouldn't have done it if I didn't have managers pointing guns at my head. The prefs certainly need to be changed; I changed them from your values for my own testing, and I didn't change them back because it seems like we need to think a bit more carefully about how we structure these cgroups. What do you think of this patch? At the very least we need to figure out if this fixes the root bug; if you have some time to test this, that would be much appreciated.
Attachment #739110 - Attachment is obsolete: true
Attachment #739110 - Flags: feedback?(justin.lebar+bug)
Attachment #740679 - Flags: review?(gsvelto)
Comment on attachment 740676 [details] [diff] [review] Part 1: Add WriteToFile to xpcom/glue/FileUtils. Review of attachment 740676 [details] [diff] [review]: ----------------------------------------------------------------- ::: xpcom/glue/FileUtils.cpp @@ +23,1 @@ > bool Can you fix the trailing whitespace on this line while you're here? @@ +103,5 @@ > + int fd; > + while ((fd = open(aFilename, O_WRONLY)) == -1 && errno == EINTR) > + {} > + > + ScopedClose autoClose(fd); You can move this declaration under the early return if fd < 0. @@ +110,5 @@ > + } > + > + int rv; > + while ((rv = write(fd, aToWrite, strlen(aToWrite))) == -1 && > + (errno == EAGAIN || errno == EINTR)) You don't have to test EAGAIN unless you open with O_NONBLOCK. ::: xpcom/glue/FileUtils.h @@ +18,5 @@ > > #include "mozilla/Scoped.h" > > +#ifdef MOZILLA_INTERNAL_API > +#include "nsString.h" If you include nsStringGlue.h instead, you can lift the MOZILLA_INTERNAL_API ifdefs. But it probably doesn't matter much. @@ +76,5 @@ > + */ > +bool WriteToFile(const char* aFilename, const char* aToWrite); > +bool WriteToFile(const nsACString& aFilename, const char* aToWrite); > +bool WriteToFile(const char* aFilename, const nsACString& aToWrite); > +bool WriteToFile(const nsACString& aFilename, const nsACString& aToWrite); Is there really interest in all combinations?
Attachment #740676 - Flags: review?(mh+mozilla) → review+
Comment on attachment 740679 [details] [diff] [review] Part 2: Use cgroups in hal::SetProcessPriority. Review of attachment 740679 [details] [diff] [review]: ----------------------------------------------------------------- The patch looks good to me though I've added a bunch of notes here and there, nothing major though. I'll try to test it ASAP but should I do it in isolation or together with the patches in bug 844323? From what I saw the last time I tested this unless we find a way to fast-path the Communications process in the highest priority group this doesn't help much as other foreground processes are slowly moved to the lower priority groups. ::: hal/gonk/GonkHal.cpp @@ +14,5 @@ > * See the License for the specific language governing permissions and > * limitations under the License. > */ > > +#include <dirent.h> This looks like a leftover from my patch (required by opendir() and friends), you can remove it. @@ +24,2 @@ > #include <sys/resource.h> > +#include <sys/stat.h> This one too, I added it because I needed mkdir(). ::: hal/gonk/GonkProcessPriority.cpp @@ +14,5 @@ > + > +#include <sys/types.h> > +#include <sys/stat.h> > +#include <fcntl.h> > +#include <asm/page.h> You're missing dirent.h, sys/stat.h as well as errno.h here. BTW I had to add errno.h to FileUtils.cpp too after applying your other patch in order for it to compile on my box (Fedora 18 x86-64). @@ +20,5 @@ > +#ifdef LOG > +#undef LOG > +#endif > + > +#define LOG(args...) __android_log_print(ANDROID_LOG_INFO, "Gonk", args) Why not use HAL_LOG() which is defined in Hal.h? @@ +67,5 @@ > + nsresult rv = Preferences::GetInt(prefName.get(), &result); > + > + if (NS_FAILED(rv)) { > + MOZ_CRASH(); > + } Wouldn't it be better to MOZ_ASSERT(NS_SUCCEEDED(rv)) ? @@ +85,5 @@ > + &result); > + > + if (NS_FAILED(rv)) { > + MOZ_CRASH(); > + } Same as before. @@ +159,5 @@ > + if (cgroupName.IsEmpty()) { > + return EmptyCString(); > + } > + > + return NS_LITERAL_CSTRING("/dev/cpuctl/") + cgroupName + You could have used kDevCpuCtl here too. @@ +259,5 @@ > + if (aPriority < 0 || > + static_cast<uint32_t>(aPriority) >= priorityClasses->Length()) { > + return nullptr; > + } > + I would have called MOZ_ASSERT(aPriority < 0 || static_cast<uint32_t>(aPriority) >= priorityClasses->Length()) here instead unless there's another reason for returning nullptr. @@ +374,5 @@ > + > + // If pc->CGroup() is empty, we should remove the pid from its current > + // cgroup. But that's kind of tricky to do, because we don't know what that > + // group is! Instead, we just bail. In practice, CGroup() is empty only for > + // the master process, which never changes priority. There's no need to remove the PIDs from the previous cgroup, just lump everything into /dev/cpuctl/tasks and that will put it in the master group. IIRC my patch was designed to do this though I agree this should never happen in practice.
Attachment #740679 - Flags: review?(gsvelto) → review+
(In reply to Justin Lebar [:jlebar] from comment #34) > This is a similar approach to the one in the patch you posted, but I changed > some things. The most major change is that I no longer cache anything > (including the fd); I felt that it was a premature optimization. The reason I did it is that when a call hits under process-buster stress moving all the TIDs amounts to a *lot* of syscalls and I feared this could slow us down. Admittedly I didn't actually measure their cost but since it was an easy fix and made the patch cleaner in my eyes I went for it. > I'm really sorry to have stolen your patch; I wouldn't have done it if I > didn't > have managers pointing guns at my head. No problem at all, I would have worked on it myself yesterday had it not been such a special day. > The prefs certainly need to be changed; I changed them from your values for > my > own testing, and I didn't change them back because it seems like we need to > think a bit more carefully about how we structure these cgroups. Yeah, the reason I went for prefs was that I was pretty much 100% sure we would need to change the actual parameters. I also like how you renamed them and wrapped the PriorityClass class around them, it makes sense and makes the code much more readable.
Comment on attachment 740679 [details] [diff] [review] Part 2: Use cgroups in hal::SetProcessPriority. I've done my testing and there's something that's not working right. I can see the main b2g process being handled like other processes, it starts in the foreground group and is moved into the background group when the phone idles... I'll try to figure out what's the problem, in the meantime I've refreshed my patch for bug 861434 and will be posting it soon together with the b2g18 back-port.
Attachment #740679 - Flags: review+ → review-
> Why not use HAL_LOG() which is defined in Hal.h? Our logging story is a mess. If I used HAL_LOG, then if my build was made without PR logging enabled, then there's no way to enable logging in this file. Given how much trouble we have with this stuff, I kind of think that having logging enabled here all the time would be worthwhile, at least for now. > + if (NS_FAILED(rv)) { > + MOZ_CRASH(); > + } > > Wouldn't it be better to MOZ_ASSERT(NS_SUCCEEDED(rv)) ? MOZ_ASSERT is debug-only, and MOZ_CRASH is everywhere. My thought is that because so many of us only use release builds, we want to be really loud about this. We have a problem in the past where we were missing a pref (due to a typo in the prefs file) and we didn't notice! MOZ_ASSERT /might/ find this on TBPL, depending on which builds we run. > I would have called MOZ_ASSERT(aPriority < 0 || > static_cast<uint32_t>(aPriority) >= priorityClasses->Length()) here instead > unless there's another reason for returning nullptr. If we MOZ_ASSERT'ed this, what would we return in release builds if aPriority was out of range? I don't want to return uninitialized memory when we can easily return null. > just lump everything into /dev/cpuctl/tasks and that will put it in the > master group. IIRC my patch was designed to do this though I agree this > should never happen in practice. Ah, perfect. Your patch probably did the right thing, and then I didn't notice and so wasn't sure what to do in my patch.
> Is there really interest in all combinations? At least three of them! Using nsStringGlue seems to work; thanks.
Attachment #740679 - Flags: review?(gsvelto)
> I've done my testing and there's something that's not working right. I can see the main b2g process > being handled like other processes, it starts in the foreground group and is moved into the > background group when the phone idles. Yeah, I saw this too. Something strange is going on...
> $ adb shell b2g-procrank > APPLICATION PID Vss Rss Pss Uss cmdline > b2g 1027 69328K 59288K 48976K 45380K /system/b2g/b2g > Homescreen 1101 26940K 26852K 16913K 13516K /system/b2g/plugin-container > Usage 1074 24000K 23912K 14357K 11340K /system/b2g/plugin-container > (Preallocated a 1106 19676K 19576K 10823K 8256K /system/b2g/plugin-container > ------ ------ ------ > 105080K 90400K TOTAL > > # With the screen off > $ adb shell cat /proc/1027/cgroup > 2:cpu:/background > 1:cpuacct:/uid/0 > > # Turn the screen on and unlock the phone. This brings the homescreen app into > # the fg, but should not affect the master process. > $ adb shell cat /proc/1027/cgroup > 2:cpu:/foreground > 1:cpuacct:/uid/0 I added a printf to show me exactly what I'm writing into /dev/cpuctl. When I turn on the screen, we only write tid's from the homescreen task into /dev/cpuctl/foreground/tasks. I checked that the tid's match the entries in /proc/1101/task, and I checked that none of the tid's are in /proc/1027/task. I wonder what's going on here. I think I'm going to see if this happens on my Linux desktop; maybe that will inform us as to whether there may be a cgroups bug here. Note to anyone inspired to try this: OOP is disabled on b2g desktop. You need to flip a pref; search for "nice things" in b2g.js.
I did some further testing with my old patch by adding a SetProcessPriority(getpid(), PROCESS_PRIORITY_FOREGROUND_HIGH) call in Navigator::RequestWakeLock() to see how this would affect CPU allocation. With this change I can see the communications app being moved immediately into the highest cgroup but it still takes ages to start. I've re-tested the CPU allocation manually between different cgroups and it works as expected so the problem must be somewhere else (with your parameters for example putting a CPU consumer in profile_high while running process-buster at the same time gives the first process ~80% CPU).
Looking through the kernel code, if there's a bug with the per-process cgroup handling, I kind of expect there would be a bug with the per-task cgroup handling. They both run cgroup_attach_proc; the only difference is that in one case the task is the "tgroup leader", while in the other case the task is the thread.
(In reply to Justin Lebar [:jlebar] from comment #44) > Looking through the kernel code, if there's a bug with the per-process > cgroup handling, I kind of expect there would be a bug with the per-task > cgroup handling. They both run cgroup_attach_proc; the only difference is > that in one case the task is the "tgroup leader", while in the other case > the task is the thread. That doesn't sound promising; on top of this I kept testing and testing and I still can't get decent results, in fact the delay is truly horrible no matter what combination I use. There's one thing that made me raise an eyebrow though: when I start hearing the calling phone ring I see this in the logcat of the called device (but only when under process-buster stress): I/Gecko ( 4884): [Parent 4884] WARNING: pipe error (154): Connection reset by peer: file /home/gsvelto/projects/mozilla-b2g18/ipc/chromium/src/chrome/common/ipc_channel_posix.cc, line 432 ... I/Gecko ( 4884): ###!!! [Parent][AsyncChannel] Error: Channel error: cannot send/recv I/Gecko ( 4884): ###!!! [Parent][AsyncChannel] Error: Channel error: cannot send/recv I/Gecko ( 4884): ###!!! [Parent][AsyncChannel] Error: Channel error: cannot send/recv I/Gecko ( 4884): ###!!! [Parent][AsyncChannel] Error: Channel error: cannot send/recv ... (a bunch of those, probably one per process from the looks of it) In this case 4884 is the main process and I'm wondering what messages are being lost and more importantly why.
> In this case 4884 is the main process and I'm wondering what messages are being lost and more > importantly why. A child process has died (process-buster stress intentionally creates a state when we're launching and killing many processes). When the main process tries to send it a message, we get this error. You can set the env var MOZ_IPC_MESSAGE_LOG=1 to see which messages are relevant here. I suspect it's not a big deal. The android cgroups implementation is not straight from upstream, so it's possible they introduced a bug. The behavior we're seeing sounds super-weird. But it also is sounding to me like perhaps we should go back to using nice; I don't see a lot of advantages to cgroups if we're not confident that they're correct.
This addresses gsvelto's review comments. We probably don't want to use this, but I want to save the patch just in case.
Attachment #740679 - Attachment is obsolete: true
Attachment #740284 - Attachment is obsolete: false
Comment on attachment 741013 [details] [diff] [review] When changing a proces's CPU priority, call setpriority() on all threads, not just the main thread. Review of attachment 741013 [details] [diff] [review]: ----------------------------------------------------------------- Looks good to me save for the leftover comment below; I will test it tomorrow. ::: hal/gonk/GonkHal.cpp @@ +1163,5 @@ > + continue; > + } > + > + //LOG("changed nice for tid %d (pid %d) from %d to %d.", > + // tid, apid, origtaskpriority, newtaskpriority); Just a forgotten comment to get rid of here.
Attachment #741013 - Flags: review?(gsvelto) → review+
Comment on attachment 741013 [details] [diff] [review] When changing a proces's CPU priority, call setpriority() on all threads, not just the main thread. Review of attachment 741013 [details] [diff] [review]: ----------------------------------------------------------------- OK, I've had time to test the patch an I confirm that it works as expected. Unfortunately it doesn't help much with prioritizing calls; I can see a 8-12s delay when getting calls under process buster stress which I don't think will go away unless we find a way not only to push the dialer in the foreground fast but also to move everybody else in the background just as fast. Just one additional functional nit: the background nice value is set to 20 in the prefs, this means that background processes' nice value will be truncated to 19 and DOM workers will also get that. When a process is brought into the foreground again all threads will have the same nice value. It would be better to change b2g.js like this to prevent this "flattening": pref("hal.processPriorityManager.gonk.backgroundHomescreenNice", 18); pref("hal.processPriorityManager.gonk.backgroundNice", 18); Finally the patch needs some case changes to the variable names to build correctly, see below. ::: hal/gonk/GonkHal.cpp @@ +1133,5 @@ > + > + while (struct dirent* de = readdir(tasksDir)) { > + char* endptr = nullptr; > + long tidlong = strtol(de->d_name, &endptr, /* base */ 10); > + if (*endptr || tidlong < 0 || tidlong > int32_max || tidlong == apid) { int32_max -> INT32_MAX, apid -> aPid @@ +1143,5 @@ > + int tid = static_cast<int>(tidlong); > + > + errno = 0; > + // get and set the task's new priority. > + int origtaskpriority = getpriority(prio_process, tid); origtaskpriority -> origTaskPriority, prio_process -> PRIO_PROCESS @@ +1147,5 @@ > + int origtaskpriority = getpriority(prio_process, tid); > + if (errno) { > + LOG("unable to get nice for tid=%d (pid=%d); error %d. this isn't " > + "necessarily a problem; it could be a benign race condition.", > + tid, apid, errno); apid -> aPid @@ +1153,5 @@ > + } > + > + int newtaskpriority = > + ns_max(origtaskpriority + anice - origprocpriority, origprocpriority); > + rv = setpriority(prio_process, tid, newtaskpriority); newtaskpriority -> newTaskPriority, origtaskpriority -> origTaskPriority, anice -> aNice, origprocpriority -> origProcPriority, prio_process -> PRIO_PROCESS @@ +1158,5 @@ > + > + if (rv) { > + LOG("unable to set nice for tid=%d (pid=%d); error %d. this isn't " > + "necessarily a problem; it could be a benign race condition.", > + tid, apid, errno); apid -> aPid @@ +1169,5 @@ > + > + LOG("Changed nice for pid %d from %d to %d.", > + aPid, origProcPriority, aNice); > + > + closedir(tasksdir); tasksdir -> tasksDir
> Finally the patch needs some case changes to the variable names to build correctly, see below. Somehow something strange happened when I imported or exported this patch. I don't know if it was an errant vim command or what. Sorry to put you through that; I promise it built at one point! > Just one additional functional nit: the background nice value is set to 20 in the prefs, this means > that background processes' nice value will be truncated to 19 and DOM workers will also get that. > When a process is brought into the foreground again all threads will have the same nice value. It > would be better to change b2g.js like this to prevent this "flattening": Yes, definitely. Maybe you could try the patches in bug 847592? Those renice every process as soon as a call is incoming, just as you describe.
(In reply to Justin Lebar [:jlebar] from comment #51) > Sorry to put you through that; I promise it built at one point! No problem, it looked like some automatic replace/turn-to-lower-case command had flattened half of the patch ;) > Maybe you could try the patches in bug 847592? Those renice every process > as soon as a call is incoming, just as you describe. Yes, I'm definitely going to try them. On this topic I did some crude testing by using these nice values to "simulate" foreground processes going quickly into the background: pref("hal.processPriorityManager.gonk.masterNice", 0); pref("hal.processPriorityManager.gonk.foregroundHighNice", 1); pref("hal.processPriorityManager.gonk.foregroundNice", 14); pref("hal.processPriorityManager.gonk.backgroundPerceivableNice", 16); pref("hal.processPriorityManager.gonk.backgroundHomescreenNice", 18); pref("hal.processPriorityManager.gonk.backgroundNice", 18); Plus a call to hal:SetProcessPriority() within Navigator::RequestWakeLock() and this didn't help the times much (if at all) so something else must be going on as you were fearing. I've also noticed some additional weirdness like some totally unrelated daemons getting a good chunk of CPU time during process buster stress (kswapd, mmcqd). Once we land the patches this merits some serious inspection with perf.
Target Milestone: 1.0.1 Madrid (19apr) → 1.0.1 Cert2 (28may)
Status: ASSIGNED → RESOLVED
Closed: 12 years ago
Resolution: --- → FIXED
Whiteboard: [madrid][status: patches in progress] → [madrid]
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: