Closed Bug 841651 Opened 12 years ago Closed 12 years ago

[OTA] PR_SetThreadPriority doesn't do anything when using Bionic

Categories

(NSPR :: NSPR, defect, P1)

ARM
Gonk (Firefox OS)
defect

Tracking

(blocking-b2g:tef+, firefox20 wontfix, firefox21 wontfix, firefox22 fixed, b2g18 fixed, b2g18-v1.0.0 wontfix, b2g18-v1.0.1 fixed)

RESOLVED FIXED
blocking-b2g tef+
Tracking Status
firefox20 --- wontfix
firefox21 --- wontfix
firefox22 --- fixed
b2g18 --- fixed
b2g18-v1.0.0 --- wontfix
b2g18-v1.0.1 --- fixed

People

(Reporter: dhylands, Assigned: gsvelto)

References

Details

(Whiteboard: [target 28/2])

Attachments

(2 files, 7 obsolete files)

(deleted), patch
Details | Diff | Splinter Review
(deleted), patch
Details | Diff | Splinter Review
So in talking with bent on IRC, RuntimeService::ScheduleWorker (http://mxr.mozilla.org/mozilla-central/source/dom/workers/RuntimeService.cpp#933) supposedly lowers the priority of DOM Worker threads. The priority->SetPriority resolves to nsThread::SetPriority (http://mxr.mozilla.org/mozilla-central/source/xpcom/threads/nsThread.cpp#659) which in turn calls PR_SetThreadPriority (http://mxr.mozilla.org/mozilla-central/source/nsprpub/pr/src/pthreads/ptthread.c#613) Under bionic, neither _PR_DCETHREADS nor _POSIX_THREAD_PRIORITY_SCHEDULING is defined (I confirmed by adding a #else and a print and saw the print). This means that calls to PR_SetThreadPriority essentially do nothing under bionic.
It basically means that Workers can swamp the main thread on B2G (and fennec I think)...
This is further confirmed by doing: 736 >adb shell 'cat /proc/109/task/*/stat' | grep DOM 250 (DOM Worker) S 1 109 0 0 -1 4194368 2204 1881 1 0 37 5 0 2 20 0 27 0 660 164769792 14170 4294967295 32768 126328 3197418368 1138752904 1074603756 0 0 4096 50927 3222128320 0 0 -1 0 0 0 0 0 0 252 (DOM Worker) S 1 109 0 0 -1 4194624 1405 1881 0 0 7 18 0 2 20 0 27 0 677 164769792 14170 4294967295 32768 126328 3197418368 1149238664 1074603756 0 0 4096 50927 3222128320 0 0 -1 0 0 0 0 0 0 253 (DOM Worker) S 1 109 0 0 -1 4194368 1328 1881 0 0 10 1 0 2 20 0 27 0 677 164769792 14170 4294967295 32768 126328 3197418368 1151202184 1074603352 0 0 4096 50927 3222519780 0 0 -1 0 0 0 0 0 0 254 (DOM Worker) S 1 109 0 0 -1 4194624 1360 1881 0 0 4 1 0 2 20 0 27 0 686 164769792 14170 4294967295 32768 126328 3197418368 1164967304 1074603756 0 0 4096 50927 3222128320 0 0 -1 0 0 0 0 0 0 (109 = pid of b2g main process). Column 18 of /proc/PID/stat is the priority, and column 19 is the nice value. 20 = the default priority (confirmed by nice = 0).
Assignee: nobody → wtc
Component: General → NSPR
Product: Boot2Gecko → NSPR
Version: unspecified → 4.9.6
Summary: PR_SetThreadPriority doesn't do anything under gonk → PR_SetThreadPriority doesn't do anything when using Bionic
In Chromium's platform_thread_posix.cc, PlatformThread::SetThreadPriority calls into Java on Android: http://src.chromium.org/viewvc/chrome/trunk/src/base/threading/platform_thread_posix.cc?view=markup I guess that approach won't work on Firefox OS. Can you try using the _POSIX_THREAD_PRIORITY_SCHEDULING code? The _PR_DCETHREADS code is obsolete.
blocking-b2g: --- → tef?
Have we seen any user fallout from this bug? Or is this mostly theoretical/correctness.
Bug 841041 is one thing that users are seeing in the real world.
Blocking since our OTA update experience needs to be rock solid.
blocking-b2g: tef? → tef+
Blocks: 841962
Since there hasn't been much action here in the past few days I'm taking this, feel free to re-take if you have a solution ready. Looking at the bionic sources it seems that pthread_setschedparam() is properly implemented: https://android.googlesource.com/platform/bionic/+/master/libc/bionic/pthread.c However other functions are not (such as pthread_attr_setinheritsched() and friends) so defining _POSIX_THREAD_PRIORITY_SCHEDULING in ptthread.c when on Android doesn't cut it. What we need is some slightly finer grained tests to check which functions are available and which are not instead of depending on the _POSIX_THREAD_PRIORITY_SCHEDULING macro being defined. I'll attach a WIP patch shortly.
Assignee: wtc → gsvelto
Status: NEW → ASSIGNED
This is a crude WIP patch that ensures that pthread_setschedaparam() & friends are being used throughout ptthread.c. I'm using it for testing and I can see the calls being executed successfully. I'll try to reproduce the OTA issue without the patch applied and see if it fixes it; once I'm done with the testing I'll put up a proper patch with the appropriate configure-time checks and changes.
This is a cleaned up patch against NSPR that checks for pthread_attr_setinheritsched() which is the only function that seems to be missing from Android and then avoids its uses in the ptthread.c code. The functionality of PR_SetThreadPriority() if I understand the code correctly (and real time POSIX threads extensions). I've tested the patch and verified that pthread_setschedparam() is being invoked and returns successfully however the priorities under /proc (comment 2) seem unaffected. I will do some further tests to see if I can reproduce the issue and if the patch fixes it. BTW it applies cleanly to both mozilla-central and mozilla-b2g18 so it should be easy to test it on both.
Attachment #715439 - Attachment is obsolete: true
Attachment #715473 - Flags: feedback?(dhylands)
I think that trying to set the priority directly only works with the real-time scheduling policies SCHED_FIFO and SCHED_RR (and the lowest realtime priority is higher than the highest normal priority, so we don't want to do that). The scheduling policy is in column 40 (of /proc/PID/stat) and since it's 0 that corresponds to SCHED_OTHER (the normal linux scheduling policy). I think we need to set the nice value, although the docs suggest that the nice value is per-process and not per-thread (http://www.kernel.org/doc/man-pages/online/pages/man7/pthreads.7.html) Although other documentation seems to contradict this (and some of the confusion comes from the kernel's notion of pid being different from the user space notion of pid). If we aren't seeing the priority changes reflected in /proc/PID/stat, then I don't think it's actually doing anything. So I think we need to figure that part out first, and then determine if this is even the right API to be calling.
Comment on attachment 715473 [details] [diff] [review] [PATCH] Make PR_SetThreadPriority() work when using Bionic Review of attachment 715473 [details] [diff] [review]: ----------------------------------------------------------------- This looks good to me, although if we're not seeing the priority reflected in /proc/PID/stat then I think we need another bug filed in order to get the DOM workers actually running at a lower priority.
Attachment #715473 - Flags: feedback?(dhylands) → feedback+
Gabriele: sorry about the confusion. I am the default bug assignee for NSPR bugs. I understand this bug has high priority and I have done what I can, but I don't know how to build NSPR for Android. I just checked the Apache Portable Runtime source code but could not find any function for setting thread priority. For example, searching for "prio" and "sched" in the apr_thread_proc.h header finds nothing: http://svn.apache.org/viewvc/apr/apr/trunk/include/apr_thread_proc.h?view=markup Another thing you can try is the setpriority() function used in the Chromium source file that I cited in comment 3. I missed that last time, but it's also in Android-specific code.
(In reply to Dave Hylands [:dhylands] from comment #10) > I think that trying to set the priority directly only works with the > real-time scheduling policies SCHED_FIFO and SCHED_RR (and the lowest > realtime priority is higher than the highest normal priority, so we don't > want to do that). I see, so pthread_setschedparam() might just be ignored by the kernel in our case. > The scheduling policy is in column 40 (of /proc/PID/stat) and since it's 0 > that corresponds to SCHED_OTHER (the normal linux scheduling policy). I > think we need to set the nice value, although the docs suggest that the nice > value is per-process and not per-thread > (http://www.kernel.org/doc/man-pages/online/pages/man7/pthreads.7.html) > > Although other documentation seems to contradict this (and some of the > confusion comes from the kernel's notion of pid being different from the > user space notion of pid). We'll have to check, if the nice value can be set per thread then we could use that instead. > If we aren't seeing the priority changes reflected in /proc/PID/stat, then I > don't think it's actually doing anything. So I think we need to figure that > part out first, and then determine if this is even the right API to be > calling. Agreed, I'll investigate a bit more. Bionic is using the sched_setscheduler() syscall, I'll dig out the source for that and see if we can coerce it into lowering those threads priorities.
(In reply to Wan-Teh Chang from comment #12) > Gabriele: sorry about the confusion. I am the default bug assignee > for NSPR bugs. I understand this bug has high priority and I have > done what I can, but I don't know how to build NSPR for Android. OK, now I understand, I picked up the bug in a hurry because we really need it done by the end of the week. As always when taking a bug from somebody else I was a little bit hesitant because from time to time it might not be the nicest thing to do. > I just checked the Apache Portable Runtime source code but could > not find any function for setting thread priority. For example, > searching for "prio" and "sched" in the apr_thread_proc.h header > finds nothing: > http://svn.apache.org/viewvc/apr/apr/trunk/include/apr_thread_proc. > h?view=markup > > Another thing you can try is the setpriority() function used in > the Chromium source file that I cited in comment 3. I missed that > last time, but it's also in Android-specific code. Yes, I'll have a look. There must be a way because the underlying kernel should support setting thread priorities so it's just a matter of finding the right button to push.
I studied the Bionic libc source code and the Linux sched_setscheduler man page. By defaults, threads get the SCHED_OTHER (aka SCHED_NORMAL, value 0) scheduling policy. The threads in this scheduling policy only have a nice value (dynamic priority) that can be set with setpriority(2) or nice(2). To use the pthreads priority, the threads need to have the SCHED_FIFO or SCHED_RR scheduling policy. The threads in these two scheduling policies have a static priority. I guess you can't use SCHED_FIFO and SCHED_RR because you are trying to lower the priority of the DOM Worker threads, right? Note: Linux threads are scheduled in the same way that processes are scheduled. So although the sched_setscheduler man page talks only about processes, I believe it applies to threads as well. See also http://stackoverflow.com/questions/7684404/is-nice-used-to-change-the-thread-priority-or-the-process-priority and http://www.kernel.org/doc/man-pages/online/pages/man7/pthreads.7.html (search for "nice").
(In reply to Wan-Teh Chang from comment #15) ...snip... > I guess you can't use SCHED_FIFO and SCHED_RR because you > are trying to lower the priority of the DOM Worker threads, > right? Right. Those both set the process to use the realtime priorities, and the lowest realtime priority is higher than the highest normal priority. > Note: Linux threads are scheduled in the same way that > processes are scheduled. So although the sched_setscheduler > man page talks only about processes, I believe it applies > to threads as well. See also > http://stackoverflow.com/questions/7684404/is-nice-used-to-change-the-thread- > priority-or-the-process-priority > > and > > http://www.kernel.org/doc/man-pages/online/pages/man7/pthreads.7.html > > (search for "nice"). Yeah - I think we need to verify that we can, in fact, use setpriority/nice for threads. If they do work, we can either modify PR_SetThreadPriority to use those instead, or fix the calling code to call them.
(In reply to Wan-Teh Chang from comment #15) > I studied the Bionic libc source code and the Linux > sched_setscheduler man page. > > [snip] > > Note: Linux threads are scheduled in the same way that > processes are scheduled. So although the sched_setscheduler > man page talks only about processes, I believe it applies > to threads as well. See also > http://stackoverflow.com/questions/7684404/is-nice-used-to-change-the-thread- > priority-or-the-process-priority > > and > > http://www.kernel.org/doc/man-pages/online/pages/man7/pthreads.7.html > > (search for "nice"). Thanks for investigating! As Dave mentioned I'll try changing the nice value for threads today and if it works I'll re-implement PR_SetThreadPriority() using the necessary calls.
After reading more documentation I'm under the impression that we should be able to re-nice a thread with the following combination: using gettid() to obtain the thread ID and then using this value when calling setpriority(PRIO_PROCESS, <thread id>, <nice value>). This *should* allow us to change the nice value of a single thread; I'll try to verify if it works first locally and then on the device and if it does I'll cook up a patch using this combination in PR_SetThreadPriority().
Gabriele: yes. We should be able to set the nice value of the current thread. We should be able to pass 0 or some symbolic constant as the second argument to setpriority() to indicate the current thread/process. It should not be necessary to call gettid(). Please try both. As for setting the nice value of an arbitrary thread, unless there is a non-standard pthread function that converts a pthread_t to a pid_t, we won't be able to do that. This limitation is not a problem for your needs. So you can add a check in PR_SetThreadPriority to compare the PRThread* argument with PR_GetCurrentThread(), and only call setpriority() if they are equal. Also, NSPR has four priority levels: LOW, NORMAL, HIGH, and URGENT. If it requires root privilege to set a negative nice value, it is fine to map NORMAL, HIGH, and URGENT to the nice value 0 and map LOW to an appropriate positive nice value. We do not need to solve this problem completely.
(In reply to Wan-Teh Chang from comment #19) > Gabriele: yes. We should be able to set the nice value of the current > thread. We should be able to pass 0 or some symbolic constant as the > second argument to setpriority() to indicate the current thread/process. > It should not be necessary to call gettid(). Please try both. I will though I think that in that case the priority of the whole process is affected. > As for setting the nice value of an arbitrary thread, unless there is > a non-standard pthread function that converts a pthread_t to a pid_t, > we won't be able to do that. This limitation is not a problem for your > needs. > > So you can add a check in PR_SetThreadPriority to compare the PRThread* > argument with PR_GetCurrentThread(), and only call setpriority() if > they are equal. I haven't found any way to convert between pthread_t handles and pid_t values however I was experimenting with adding a pid_t field to |struct PRThread| and then setting it with gettid() in _pt_root(). Does that sound like a reasonable solution to you? It shouldn't be too invasive and would give us all the full functionality of PR_SetThreadPriority(). > Also, NSPR has four priority levels: LOW, NORMAL, HIGH, and URGENT. > If it requires root privilege to set a negative nice value, it is fine > to map NORMAL, HIGH, and URGENT to the nice value 0 and map LOW to an > appropriate positive nice value. We do not need to solve this problem > completely. From my testing LOW seems to be translated by pt_PriorityMap() into the value 1 which is accepted by the kernel. I'll try with the other values and see if they also work, if they don't I'll make sure they're mapped to 0.
This patch implements PR_SetThreadPriority() by doing the following: - adding a 'tid' value to each PRThread object obtained via gettid() during thread initialization - modifying PR_SetThreadPriority to use setpriority() to change the nice value of a thread - using pt_PriorityMap to map the NSPR priority values to the following nice values: PR_PRIORITY_LOW 1 PR_PRIORITY_NORMAL 0 PR_PRIORITY_HIGH -1 PR_PRIORITY_URGENT -2 I've verified that the negative nice values can be set on the device and I've also verified that the DOM Worker threads are getting a lower priority by looking them up under /proc/stat.
Attachment #715473 - Attachment is obsolete: true
Attachment #716112 - Flags: review?(wtc)
This is the try run for the attached patch: https://tbpl.mozilla.org/?tree=Try&rev=1c8ce565ce77 Do you think you can review the patch soon? If you're busy I can see if somebody else can do the review; the issue here being that we need to land this before the 28th of February so the sooner the better.
Whiteboard: [target 28/2]
I've had some time to revise the patch and extend it since I noticed that this problem wasn't affecting only Android but also Linux for the very same reason. This patch is more extensive and works on both systems by carefully checking what's available to set the nice value. In addition to this it also properly set the priority of new threads at creation time. This patch still has an issue however in that PR_SetThreadPriority() is racy due to _pt_root() setting the |tid| value when starting up. If the calling thread quickly calls PR_SetThreadPriority() on a newly created thread it might see a (harmless) value of 0 in |tid|. A quick fix could be done by marking tid volatile and spinning over it until it gets set, a more thorough one would be adding another mutex & condition variable to pt_book used to serialize access to |tid| but that seems overkill to me for an essentially harmless condition.
Attachment #716112 - Attachment is obsolete: true
Attachment #716112 - Flags: review?(wtc)
Attachment #717885 - Flags: review?(wtc)
Summary: PR_SetThreadPriority doesn't do anything when using Bionic → [OTA] PR_SetThreadPriority doesn't do anything when using Bionic
Comment on attachment 717885 [details] [diff] [review] [PATCH v3] Implement PR_SetThreadPriority() on Linux-based platforms using per-thread nice values Review of attachment 717885 [details] [diff] [review]: ----------------------------------------------------------------- Gabriele: thanks a lot for writing the patch. I was sick since last Wednesday. I should have told you who else you can ask for review. (I didn't expect the cold to be so serious.) Sorry! In the future you can ask ted or bsmith to review NSPR patches. I will review the patch tomorrow. I noticed the problem of setting the tid field. I remember I used a lock that you described before, but apparently that code is gone. If you only need to set the priority of the current thread, we can also avoid this problem.
(In reply to Wan-Teh Chang from comment #24) > Gabriele: thanks a lot for writing the patch. I was > sick since last Wednesday. I should have told you > who else you can ask for review. (I didn't expect > the cold to be so serious.) Sorry! In the future > you can ask ted or bsmith to review NSPR patches. > I will review the patch tomorrow. Thank you for doing this and thanks for the heads up to the other reviewers. I can relate to you because a cold knocked me out starting Friday morning and lasting until Monday :) > I noticed the problem of setting the tid field. > I remember I used a lock that you described before, > but apparently that code is gone. If you only need > to set the priority of the current thread, we can > also avoid this problem. Unfortunately not, what the code does is create a new thread and then set its priority so the change always happens on another thread. I'll revise the patch and use the mutex & condition variable in |pt_book| to remove the race.
Revised non-racy patch, uses |pt_book.lock| and |pt_book.cv| to synchronize access to the |tid| field when a new thread starts up. Also initializes the main thread's |tid| field in _PR_InitThreads(). This fixes three distinct issues with the previous patch: - The main thread not being able to call PR_SetThreadPriority() on itself as it's |tid| value was never initialized - A thread quickly calling PR_SetThreadPriority() on a newly created thread could see a |tid| value of 0 due to a race-condition - The corner-case where a thread would be created with a non-default priority argument and then its priority changed quickly thereafter. This was also a potential race between _pt_root() and PR_SetThreadPriority() though unlikely in practice because this pattern is never used in the codebase I've manually checked that the nice value for the DOM Worker threads is being set correctly both on an Otoro device with B2G and on my Linux desktop build.
Attachment #717885 - Attachment is obsolete: true
Attachment #717885 - Flags: review?(wtc)
Attachment #718367 - Flags: review?(wtc)
I mistakenly attached a patch that had a formatting issue, this is the correct one.
Attachment #718367 - Attachment is obsolete: true
Attachment #718367 - Flags: review?(wtc)
Attachment #718401 - Flags: review?(wtc)
Comment on attachment 718401 [details] [diff] [review] [PATCH v4] Implement PR_SetThreadPriority() on Linux-based platforms using per-thread nice values Review of attachment 718401 [details] [diff] [review]: ----------------------------------------------------------------- I spent some time reading the code here to familiarize myself with it. After reading that, this patch looks good. I was a little concerned about the reuse of the CondVar, but it looks like this won't cause any problems. If you can fix my few comments I think we can land this. Wan-Teh may have additional comments, but I think we can fix those after the fact, since it sounds like he's okay with the approach here. ::: nsprpub/pr/include/private/primpl.h @@ +192,5 @@ > + * instead which requires us to track kernel thread IDs for each POSIX > + * thread we create. > + */ > +#if HAVE_SYS_SYSCALL_H > +#include <sys/syscall.h> It looks like this file keeps all its #includes up top, could you move this up to match? ::: nsprpub/pr/src/pthreads/ptthread.c @@ +28,5 @@ > #undef _POSIX_THREAD_PRIORITY_SCHEDULING > #endif > > +#if defined(_PR_NICE_PRIORITY_SCHEDULING) > +#undef _POSIX_THREAD_PRIORITY_SCHEDULING It would be interesting to know if the _POSIX_THREAD_PRIORITY_SCHEDULING codepaths actually do something useful on other POSIX platforms, since they clearly don't work for our purposes on Linux. @@ +134,5 @@ > +#if HAVE_SYSCALL && defined(SYS_gettid) > + thred->tid = syscall(SYS_gettid); > +#elif HAVE_GETTID > + thred->tid = gettid(); > +#endif Do we not have gettid in Bionic? (Is that why you made two paths here?) Could we just use the syscall path in both cases? @@ +914,5 @@ > +#if defined(_PR_NICE_PRIORITY_SCHEDULING) > +#if HAVE_SYSCALL && defined(SYS_gettid) > + thred->tid = syscall(SYS_gettid); > +#elif HAVE_GETTID > + thred->tid = gettid(); Same here as above.
Comment on attachment 718401 [details] [diff] [review] [PATCH v4] Implement PR_SetThreadPriority() on Linux-based platforms using per-thread nice values Review of attachment 718401 [details] [diff] [review]: ----------------------------------------------------------------- r=wtc. I am giving you my review+ in advance because the changes I suggest are very minor. Ted: NSPR source code is being migrated from CVS to Hg. So this may need to be checked in to the copy of nsprpub in mozilla-central as a private patch. Please create a nsprpub/patches directory and document the patch there. Use security/nss/patches as a model. Thanks. Gabriele: PR_CreateThread allows us to specify the priority of the new thread. So it is possible to avoid the locking of pt_book.ml if we change Mozilla/XPCOM at the same time to have a variable of nsIThreadManager::NewThread that takes a 'priority' argument. (Changes to nsThread.cpp may also be necessary.) I am a little worried that the effect of the pt_book.ml locking may be measurable in some benchmark. I wonder if we can call the Bionic function __pthread_gettid(). ::: nsprpub/configure.in @@ +2567,4 @@ > dnl AC_HEADER_DIRENT > dnl AC_HEADER_STDC > dnl AC_HEADER_SYS_WAIT > dnl AC_CHECK_HEADERS(fcntl.h limits.h sys/file.h sys/ioctl.h sys/time.h unistd.h) Can you add the AC_CHECK_HEADERS(sys/syscall.h) after this line? BUT, I think it is fine to omit this header check and just rely on the check of the syscall function. We can assume that if syscall() exists, then <sys/syscall.h> also exists. You made a similar assumption that if setpriority() exists, then <sys/resource.h> exists. @@ +2589,5 @@ > dnl ======================================================== > AC_PROG_GCC_TRADITIONAL > _SAVE_LIBS="$LIBS" > LIBS="$LIBS $OS_LIBS" > +AC_CHECK_FUNCS(gettid lchown setpriority strerror syscall dladdr) Let's list the functions in alphabetical order (move dladdr to the beginning of the list). ::: nsprpub/pr/include/private/primpl.h @@ +191,5 @@ > + * real-time threads. On those platforms we set thread's nice values > + * instead which requires us to track kernel thread IDs for each POSIX > + * thread we create. > + */ > +#if HAVE_SYS_SYSCALL_H Use #ifdef @@ +195,5 @@ > +#if HAVE_SYS_SYSCALL_H > +#include <sys/syscall.h> > +#endif > + > +#if ((HAVE_SYSCALL && defined(SYS_gettid)) || HAVE_GETTID) && HAVE_SETPRIORITY Use defined(HAVE_SYSCALL), defined(HAVE_GETTID), defined(HAVE_SETPRIORITY). ::: nsprpub/pr/src/pthreads/ptthread.c @@ +130,5 @@ > + /* > + * We need to know the kernel thread ID of each thread in order to > + * set its priority hence we do it here instead of at creation time. > + */ > +#if HAVE_SYSCALL && defined(SYS_gettid) Use defined(HAVE_SYSCALL). Make the same change elsewhere. @@ +134,5 @@ > +#if HAVE_SYSCALL && defined(SYS_gettid) > + thred->tid = syscall(SYS_gettid); > +#elif HAVE_GETTID > + thred->tid = gettid(); > +#endif I also suggest we just call syscall(SYS_gettid). @@ +136,5 @@ > +#elif HAVE_GETTID > + thred->tid = gettid(); > +#endif > + > + rv = setpriority(PRIO_PROCESS, thred->tid, pt_PriorityMap(thred->priority)); Call setpriority() after PR_Unlock(). This minimizes the critical section. @@ +684,5 @@ > + PR_Unlock(pt_book.ml); > + > + rv = setpriority(PRIO_PROCESS, thred->tid, pt_PriorityMap(newPri)); > + > + if (EPERM == rv) BUG: this should be rv == -1 && errno == EPERM. This seems to imply you didn't observe the EPERM failure in your testing.
Attachment #718401 - Flags: review?(wtc) → review+
Comment on attachment 718401 [details] [diff] [review] [PATCH v4] Implement PR_SetThreadPriority() on Linux-based platforms using per-thread nice values Review of attachment 718401 [details] [diff] [review]: ----------------------------------------------------------------- ::: nsprpub/pr/src/pthreads/ptthread.c @@ +136,5 @@ > +#elif HAVE_GETTID > + thred->tid = gettid(); > +#endif > + > + rv = setpriority(PRIO_PROCESS, thred->tid, pt_PriorityMap(thred->priority)); On second thought, I think it is safer to call setpriority before the PR_Lock() call. This way this setpriority() call won't interfere with a PR_SetThreadPriority() call made by another thread.
Comment on attachment 718401 [details] [diff] [review] [PATCH v4] Implement PR_SetThreadPriority() on Linux-based platforms using per-thread nice values Review of attachment 718401 [details] [diff] [review]: ----------------------------------------------------------------- ::: nsprpub/pr/src/pthreads/ptthread.c @@ +914,5 @@ > +#if defined(_PR_NICE_PRIORITY_SCHEDULING) > +#if HAVE_SYSCALL && defined(SYS_gettid) > + thred->tid = syscall(SYS_gettid); > +#elif HAVE_GETTID > + thred->tid = gettid(); Please add the same code to pt_AttachThread(), which is used to give a PRThread structure to a thread not created by NSPR.
Comment on attachment 718401 [details] [diff] [review] [PATCH v4] Implement PR_SetThreadPriority() on Linux-based platforms using per-thread nice values Review of attachment 718401 [details] [diff] [review]: ----------------------------------------------------------------- ::: nsprpub/pr/include/private/primpl.h @@ +195,5 @@ > +#if HAVE_SYS_SYSCALL_H > +#include <sys/syscall.h> > +#endif > + > +#if ((HAVE_SYSCALL && defined(SYS_gettid)) || HAVE_GETTID) && HAVE_SETPRIORITY We should add defined(LINUX) && because this should only be used for Linux and its derivatives. This is a property of the Linux kernel.
Thanks both for your comments/review, I'll revise the patch ASAP: (In reply to Ted Mielczarek [:ted.mielczarek] from comment #28) > ::: nsprpub/pr/include/private/primpl.h > @@ +192,5 @@ > > + * instead which requires us to track kernel thread IDs for each POSIX > > + * thread we create. > > + */ > > +#if HAVE_SYS_SYSCALL_H > > +#include <sys/syscall.h> > > It looks like this file keeps all its #includes up top, could you move this > up to match? Sure, I will also remove the HAVE_SYS_SYSCALL_H check as suggested by Wan-Teh. > It would be interesting to know if the _POSIX_THREAD_PRIORITY_SCHEDULING > codepaths actually do something useful on other POSIX platforms, since they > clearly don't work for our purposes on Linux. A look at FreeBSD's documentation indicates that the _POSIX_THREAD_PRIORITY_SCHEDULING paths will work as expected there, NetBSD also seems to honor calls to pthread_setschedparam() when using SCHED_OTHER. It seems to me that in general a way to detect if the priority change has any effect at all is checking the returned values of sched_get_priority_min/max calls, on Linux they both return 0 while on the other platforms I tested they return an actual range (e.g. 0-63 on NetBSD). > @@ +134,5 @@ > > +#if HAVE_SYSCALL && defined(SYS_gettid) > > + thred->tid = syscall(SYS_gettid); > > +#elif HAVE_GETTID > > + thred->tid = gettid(); > > +#endif > > Do we not have gettid in Bionic? (Is that why you made two paths here?) > Could we just use the syscall path in both cases? Bionic has a gettid() prototype but does not define the SYS_gettid macro; glibc on my box (Fedora 17) on the other hand defines SYS_gettid but does not offer a gettid() prototype. To ensure maximum coverage I added both code paths. (In reply to Wan-Teh Chang from comment #29) > PR_CreateThread allows us to specify the priority of the new thread. > So it is possible to avoid the locking of pt_book.ml if we change > Mozilla/XPCOM > at the same time to have a variable of nsIThreadManager::NewThread that takes > a 'priority' argument. (Changes to nsThread.cpp may also be necessary.) Yes, I noticed that the functionality in nsIThreadManager does not expose the capability of PRThreads to have their priority set a startup. I can file a follow-bug to add that bit and then change all the places in the code where we create a thread and then separately change its priority. Still as long as we have PR_SetThreadPriority() around we need to have the locking when creating a new thread to prevent races between it and _pt_root(). > I am a little worried that the effect of the pt_book.ml locking may be > measurable in some benchmark. I wonder if we can call the Bionic function > __pthread_gettid(). __pthread_gettid() does not seem to be exposed publicly anywhere as it sits under libc/private/bionic_pthread.h, besides using it wouldn't solve our problem on glibc-based hosts which do not have it. > BUT, I think it is fine to omit this header check and just rely on the > check of the syscall function. We can assume that if syscall() exists, > then <sys/syscall.h> also exists. You made a similar assumption that > if setpriority() exists, then <sys/resource.h> exists. Yes, I think that's reasonable, I'll remove the check. > @@ +684,5 @@ > > + PR_Unlock(pt_book.ml); > > + > > + rv = setpriority(PRIO_PROCESS, thred->tid, pt_PriorityMap(newPri)); > > + > > + if (EPERM == rv) > > BUG: this should be rv == -1 && errno == EPERM. > > This seems to imply you didn't observe the EPERM failure in your testing. Nice catch, I'll fix it. I'll retest just in case but since I was printing out everything and looking at the output and I always got rv == 0 I think I should have never hit EPERM. (In reply to Wan-Teh Chang from comment #30) > On second thought, I think it is safer to call setpriority before > the PR_Lock() call. This way this setpriority() call won't interfere > with a PR_SetThreadPriority() call made by another thread. You mean doing: thread->tid = gettid(); setpriority(tid, ...); PR_Lock(); PR_NotifyAllCondVar(pt_book.cv); PR_Unlock(); I think this would still be racy as the |tid| value could become visible to another thread before setpriority() has been called. Besides it seems to me that the overhead here is that we will now invoke setpriority() twice on a thread every time we create one and then call PR_SetThreadPriority() on it; addressing that in a follow-up patch we mentioned above should be easy to fix though.
Here's the revised patch, I've address all the issues pointed out in the comments besides rearranging the locking in _pt_root() because it's not clear to me that it would avoid all potential race conditions. Additionally I factored out gettid() so that if the function is not available I will define it on my own to syscall(SYS_gettid). This makes the code that uses it less verbose.
Attachment #718401 - Attachment is obsolete: true
Attachment #718937 - Flags: review?(wtc)
This is the try-run for the latest version of the patch: https://tbpl.mozilla.org/?tree=Try&rev=773030d419d3
wtc, if you're going to land patches in Mercurial, please set the author to be who actually wrote the patch. Thankfully, we don't live in CVS-land anymore. :) https://hg.mozilla.org/integration/mozilla-inbound/rev/ae360bf4bc47
Gabriele: I made three changes to your patch v5. 1. In the PRThread structure, I put the new 'tid' member immediately after the 'id' member because they are closely related. 2. I removed "undef gettid" before the definition of the gettid macro. By the way, I think it is fine to just call syscall(SYS_gettid). 3. In _pt_root, I moved the gettid() and the setpriority() calls outside the critical region. Inside the critical region I only set thred->tid member, because that member is checked inside the critical region in PR_SetThreadPriority. I've pushed this patch to mozilla-inbound. The changeset also includes the creation of the nsprpub/patches directory, with a README file and this patch. This setup will be useful when we need to apply private NSPR patches in the future. reed: how do I set the author? I know just enough Hg to get my job done. Thanks.
Attachment #718937 - Attachment is obsolete: true
Attachment #718937 - Flags: review?(wtc)
(In reply to Gabriele Svelto [:gsvelto] from comment #33) > > A look at FreeBSD's documentation indicates that the > _POSIX_THREAD_PRIORITY_SCHEDULING paths will work as expected there, NetBSD > also seems to honor calls to pthread_setschedparam() when using SCHED_OTHER. Thanks for the research. I agree that the problem we are solving here seems specific to the Linux kernel and we cannot conclude that other platforms would have the same problem automatically. > Bionic has a gettid() prototype but does not define the SYS_gettid macro; > glibc on my box (Fedora 17) on the other hand defines SYS_gettid but does > not offer a gettid() prototype. To ensure maximum coverage I added both code > paths. I'm sorry I missed this comment. This means we can't simply call syscall(SYS_gettid). It would be nice to add this comment to your patch (where you define the gettid macro).
This is the patch against mozilla-b2g18, asking for approval to have it landed there too. NOTE: Please see https://wiki.mozilla.org/Release_Management/B2G_Landing to better understand the B2G approval process and landings. [Approval Request Comment] Bug caused by (feature/regressing bug #): 841962, 841041 User impact if declined: Activities using worker threads can make the phone completely unresponsive. Testing completed: Tested on Otoro and Unagi devices as well as a desktop build Risk to taking this patch (and alternatives if risky): The only potential problems are build-related ones but the extensive checks and alternate paths should take care of them. At runtime no ill side-effects can happen as in the worst case the thread priority adjustments will be ignored. String or UUID changes made by this patch: None
Attachment #719218 - Flags: approval-mozilla-b2g18?
Status: ASSIGNED → RESOLVED
Closed: 12 years ago
Resolution: --- → FIXED
Comment on attachment 719218 [details] [diff] [review] [PATCH v6 mozilla-b2g18] Implement PR_SetThreadPriority() on Linux-based platforms using per-thread nice values tef+ bugs have automatic uplift approval
Attachment #719218 - Flags: approval-mozilla-b2g18?
(In reply to Ryan VanderMeulen [:RyanVM] from comment #43) > https://hg.mozilla.org/releases/mozilla-b2g18/rev/82b414f81f1b > https://hg.mozilla.org/releases/mozilla-b2g18_v1_0_1/rev/2b9203badf30 IMPORTANT WARNING: You should have used my mozilla-inbound patchset. I think this is my fault because the patch v6 I attached omitted some changes. Sorry about the confusion. A dummy prdepend.h change is highly recommended because NSPR makefiles don't have proper header dependencies. This patch adds a new member to the PRThread structure, so if another .c file (other than ptthread.c) also uses a member of PRThread but is not recompiled, it will use the old offset of the member and things will go wrong. A fresh rebuild won't have this problem. I also recommend uplifting the new nsprpub/patches/README and nsprpub/patches/linux-setpriority.patch files.
Ryan, see above.
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
I'll close this bug after I have checked in the patch to the NSPR repository. The NSPR repository is closed to checkins right now because it is being migrated from CVS to Hg.
Status: REOPENED → ASSIGNED
Priority: -- → P1
Whiteboard: [target 28/2] → [target 28/2][leave open]
Target Milestone: --- → 4.9.6
Version: 4.9.6 → other
And yes, when there's a non-obsolete branch-specific patch attached to a bug, I'm naturally going to assume that it's what needs to be landed.
(In reply to Wan-Teh Chang from comment #44) > You should have used my mozilla-inbound patchset. I think this is my > fault because the patch v6 I attached omitted some changes. Sorry > about the confusion. My fault, I had checked your commit and noticed the change to prdepend.h but then forgot to add it back to the new patch. That'll teach me not to prepare patches after midnight.
Wan-Teh, this is causing bug 689256 to occur on Linux debug with very high frequency. Normally, that would be cause for automatic backout. However, this is complicated by the fact that this is a tef+ bug that's already been uplifted to the other branches. Sorry for taking so long to figure this out, but it's taken awhile to go back and bisect with the tree in as bad of shape as it was last week :-( Anyway, but 689256 needs immediate attention or I will have to back this out on trunk. Please look into it ASAP.
(In reply to Ryan VanderMeulen [:RyanVM] from comment #50) > Anyway, but 689256 needs immediate attention or I will have to back this out > on trunk. Please look into it ASAP. I'll try to reproduce it tomorrow and see if I can fix it because we need this patch on FxOS, otherwise certain activities can make the phones completely unresponsive. Let's follow up directly in bug 689256.
Bug 841651 tracks the failure of browser_522375.js, a test that seems to rely on timing somehow... That's obviously a bad idea. This patch could have changed some of our timing a little bit, so I am not too surprised that at least one test fails now. I'd say we should disable browser_522375.js rather than back this out, though.
RyanVM: thank you for letting us know. Gabriele is already making good progress on tracking down bug 689256. I'll leave that to his capable hands.
Are things still going alright here, Gabriele?
Flags: needinfo?(gsvelto)
(In reply to Andrew Overholt [:overholt] from comment #54) > Are things still going alright here, Gabriele? Yes, the patch already landed on b2g18 and central fixing the problem. The bug is still open only because Wan-Teh still needs to commit it in the NSPR repository when the CVS-to-Mercurial migration will be finished.
Flags: needinfo?(gsvelto)
Whiteboard: [target 28/2][leave open] → [target 28/2][leave open][NPOTB]
Comment on attachment 719156 [details] [diff] [review] [PATCH v6] Implement PR_SetThreadPriority() on Linux-based platforms using per-thread nice values, by Gabriele Svelto Patch pushed to the new NSPR hg repository, with Gabriele correctly listed as the author this time :-) http://hg.mozilla.org/projects/nspr/rev/39cb7968bf9b Patch checked in on the NSPR CVS trunk (NSPR 4.9.6). Checking in configure; /cvsroot/mozilla/nsprpub/configure,v <-- configure new revision: 1.339; previous revision: 1.338 done Checking in configure.in; /cvsroot/mozilla/nsprpub/configure.in,v <-- configure.in new revision: 1.342; previous revision: 1.341 done Checking in pr/include/private/primpl.h; /cvsroot/mozilla/nsprpub/pr/include/private/primpl.h,v <-- primpl.h new revision: 3.97; previous revision: 3.96 done Checking in pr/src/pthreads/ptthread.c; /cvsroot/mozilla/nsprpub/pr/src/pthreads/ptthread.c,v <-- ptthread.c new revision: 3.102; previous revision: 3.101 done
Attachment #719156 - Flags: checked-in+
Status: ASSIGNED → RESOLVED
Closed: 12 years ago12 years ago
Resolution: --- → FIXED
Whiteboard: [target 28/2][leave open][NPOTB] → [target 28/2][NPOTB]
I think this should be backed out. I've filed bug 861434 explaining why. I'd like to file a separate bug for re-implementing the functionality desired in this bug.
Whiteboard: [target 28/2][NPOTB] → [target 28/2]
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: