Closed
Bug 841651
Opened 12 years ago
Closed 12 years ago
[OTA] PR_SetThreadPriority doesn't do anything when using Bionic
Categories
(NSPR :: NSPR, defect, P1)
Tracking
(blocking-b2g:tef+, firefox20 wontfix, firefox21 wontfix, firefox22 fixed, b2g18 fixed, b2g18-v1.0.0 wontfix, b2g18-v1.0.1 fixed)
People
(Reporter: dhylands, Assigned: gsvelto)
References
Details
(Whiteboard: [target 28/2])
Attachments
(2 files, 7 obsolete files)
(deleted),
patch
|
wtc
:
checked-in+
|
Details | Diff | Splinter Review |
(deleted),
patch
|
Details | Diff | Splinter Review |
So in talking with bent on IRC, RuntimeService::ScheduleWorker (http://mxr.mozilla.org/mozilla-central/source/dom/workers/RuntimeService.cpp#933) supposedly lowers the priority of DOM Worker threads.
The priority->SetPriority resolves to nsThread::SetPriority (http://mxr.mozilla.org/mozilla-central/source/xpcom/threads/nsThread.cpp#659)
which in turn calls PR_SetThreadPriority (http://mxr.mozilla.org/mozilla-central/source/nsprpub/pr/src/pthreads/ptthread.c#613)
Under bionic, neither _PR_DCETHREADS nor _POSIX_THREAD_PRIORITY_SCHEDULING is defined (I confirmed by adding a #else and a print and saw the print).
This means that calls to PR_SetThreadPriority essentially do nothing under bionic.
It basically means that Workers can swamp the main thread on B2G (and fennec I think)...
Reporter | ||
Comment 2•12 years ago
|
||
This is further confirmed by doing:
736 >adb shell 'cat /proc/109/task/*/stat' | grep DOM
250 (DOM Worker) S 1 109 0 0 -1 4194368 2204 1881 1 0 37 5 0 2 20 0 27 0 660 164769792 14170 4294967295 32768 126328 3197418368 1138752904 1074603756 0 0 4096 50927 3222128320 0 0 -1 0 0 0 0 0 0
252 (DOM Worker) S 1 109 0 0 -1 4194624 1405 1881 0 0 7 18 0 2 20 0 27 0 677 164769792 14170 4294967295 32768 126328 3197418368 1149238664 1074603756 0 0 4096 50927 3222128320 0 0 -1 0 0 0 0 0 0
253 (DOM Worker) S 1 109 0 0 -1 4194368 1328 1881 0 0 10 1 0 2 20 0 27 0 677 164769792 14170 4294967295 32768 126328 3197418368 1151202184 1074603352 0 0 4096 50927 3222519780 0 0 -1 0 0 0 0 0 0
254 (DOM Worker) S 1 109 0 0 -1 4194624 1360 1881 0 0 4 1 0 2 20 0 27 0 686 164769792 14170 4294967295 32768 126328 3197418368 1164967304 1074603756 0 0 4096 50927 3222128320 0 0 -1 0 0 0 0 0 0
(109 = pid of b2g main process).
Column 18 of /proc/PID/stat is the priority, and column 19 is the nice value. 20 = the default priority (confirmed by nice = 0).
Reporter | ||
Updated•12 years ago
|
Assignee: nobody → wtc
Component: General → NSPR
Product: Boot2Gecko → NSPR
Version: unspecified → 4.9.6
Updated•12 years ago
|
Summary: PR_SetThreadPriority doesn't do anything under gonk → PR_SetThreadPriority doesn't do anything when using Bionic
Comment 3•12 years ago
|
||
In Chromium's platform_thread_posix.cc, PlatformThread::SetThreadPriority
calls into Java on Android:
http://src.chromium.org/viewvc/chrome/trunk/src/base/threading/platform_thread_posix.cc?view=markup
I guess that approach won't work on Firefox OS. Can you try using
the _POSIX_THREAD_PRIORITY_SCHEDULING code? The _PR_DCETHREADS
code is obsolete.
Updated•12 years ago
|
blocking-b2g: --- → tef?
Comment 4•12 years ago
|
||
Have we seen any user fallout from this bug? Or is this mostly theoretical/correctness.
Bug 841041 is one thing that users are seeing in the real world.
Comment 6•12 years ago
|
||
Blocking since our OTA update experience needs to be rock solid.
Updated•12 years ago
|
blocking-b2g: tef? → tef+
Assignee | ||
Comment 7•12 years ago
|
||
Since there hasn't been much action here in the past few days I'm taking this, feel free to re-take if you have a solution ready.
Looking at the bionic sources it seems that pthread_setschedparam() is properly implemented:
https://android.googlesource.com/platform/bionic/+/master/libc/bionic/pthread.c
However other functions are not (such as pthread_attr_setinheritsched() and friends) so defining _POSIX_THREAD_PRIORITY_SCHEDULING in ptthread.c when on Android doesn't cut it. What we need is some slightly finer grained tests to check which functions are available and which are not instead of depending on the _POSIX_THREAD_PRIORITY_SCHEDULING macro being defined. I'll attach a WIP patch shortly.
Assignee: wtc → gsvelto
Status: NEW → ASSIGNED
Assignee | ||
Comment 8•12 years ago
|
||
This is a crude WIP patch that ensures that pthread_setschedaparam() & friends are being used throughout ptthread.c. I'm using it for testing and I can see the calls being executed successfully. I'll try to reproduce the OTA issue without the patch applied and see if it fixes it; once I'm done with the testing I'll put up a proper patch with the appropriate configure-time checks and changes.
Assignee | ||
Comment 9•12 years ago
|
||
This is a cleaned up patch against NSPR that checks for pthread_attr_setinheritsched() which is the only function that seems to be missing from Android and then avoids its uses in the ptthread.c code. The functionality of PR_SetThreadPriority() if I understand the code correctly (and real time POSIX threads extensions).
I've tested the patch and verified that pthread_setschedparam() is being invoked and returns successfully however the priorities under /proc (comment 2) seem unaffected. I will do some further tests to see if I can reproduce the issue and if the patch fixes it.
BTW it applies cleanly to both mozilla-central and mozilla-b2g18 so it should be easy to test it on both.
Attachment #715439 -
Attachment is obsolete: true
Attachment #715473 -
Flags: feedback?(dhylands)
Reporter | ||
Comment 10•12 years ago
|
||
I think that trying to set the priority directly only works with the real-time scheduling policies SCHED_FIFO and SCHED_RR (and the lowest realtime priority is higher than the highest normal priority, so we don't want to do that).
The scheduling policy is in column 40 (of /proc/PID/stat) and since it's 0 that corresponds to SCHED_OTHER (the normal linux scheduling policy). I think we need to set the nice value, although the docs suggest that the nice value is per-process and not per-thread
(http://www.kernel.org/doc/man-pages/online/pages/man7/pthreads.7.html)
Although other documentation seems to contradict this (and some of the confusion comes from the kernel's notion of pid being different from the user space notion of pid).
If we aren't seeing the priority changes reflected in /proc/PID/stat, then I don't think it's actually doing anything. So I think we need to figure that part out first, and then determine if this is even the right API to be calling.
Reporter | ||
Comment 11•12 years ago
|
||
Comment on attachment 715473 [details] [diff] [review]
[PATCH] Make PR_SetThreadPriority() work when using Bionic
Review of attachment 715473 [details] [diff] [review]:
-----------------------------------------------------------------
This looks good to me, although if we're not seeing the priority reflected in /proc/PID/stat then I think we need another bug filed in order to get the DOM workers actually running at a lower priority.
Attachment #715473 -
Flags: feedback?(dhylands) → feedback+
Comment 12•12 years ago
|
||
Gabriele: sorry about the confusion. I am the default bug assignee
for NSPR bugs. I understand this bug has high priority and I have
done what I can, but I don't know how to build NSPR for Android.
I just checked the Apache Portable Runtime source code but could
not find any function for setting thread priority. For example,
searching for "prio" and "sched" in the apr_thread_proc.h header
finds nothing:
http://svn.apache.org/viewvc/apr/apr/trunk/include/apr_thread_proc.h?view=markup
Another thing you can try is the setpriority() function used in
the Chromium source file that I cited in comment 3. I missed that
last time, but it's also in Android-specific code.
Assignee | ||
Comment 13•12 years ago
|
||
(In reply to Dave Hylands [:dhylands] from comment #10)
> I think that trying to set the priority directly only works with the
> real-time scheduling policies SCHED_FIFO and SCHED_RR (and the lowest
> realtime priority is higher than the highest normal priority, so we don't
> want to do that).
I see, so pthread_setschedparam() might just be ignored by the kernel in our case.
> The scheduling policy is in column 40 (of /proc/PID/stat) and since it's 0
> that corresponds to SCHED_OTHER (the normal linux scheduling policy). I
> think we need to set the nice value, although the docs suggest that the nice
> value is per-process and not per-thread
> (http://www.kernel.org/doc/man-pages/online/pages/man7/pthreads.7.html)
>
> Although other documentation seems to contradict this (and some of the
> confusion comes from the kernel's notion of pid being different from the
> user space notion of pid).
We'll have to check, if the nice value can be set per thread then we could use that instead.
> If we aren't seeing the priority changes reflected in /proc/PID/stat, then I
> don't think it's actually doing anything. So I think we need to figure that
> part out first, and then determine if this is even the right API to be
> calling.
Agreed, I'll investigate a bit more. Bionic is using the sched_setscheduler() syscall, I'll dig out the source for that and see if we can coerce it into lowering those threads priorities.
Assignee | ||
Comment 14•12 years ago
|
||
(In reply to Wan-Teh Chang from comment #12)
> Gabriele: sorry about the confusion. I am the default bug assignee
> for NSPR bugs. I understand this bug has high priority and I have
> done what I can, but I don't know how to build NSPR for Android.
OK, now I understand, I picked up the bug in a hurry because we really need it done by the end of the week. As always when taking a bug from somebody else I was a little bit hesitant because from time to time it might not be the nicest thing to do.
> I just checked the Apache Portable Runtime source code but could
> not find any function for setting thread priority. For example,
> searching for "prio" and "sched" in the apr_thread_proc.h header
> finds nothing:
> http://svn.apache.org/viewvc/apr/apr/trunk/include/apr_thread_proc.
> h?view=markup
>
> Another thing you can try is the setpriority() function used in
> the Chromium source file that I cited in comment 3. I missed that
> last time, but it's also in Android-specific code.
Yes, I'll have a look. There must be a way because the underlying kernel should support setting thread priorities so it's just a matter of finding the right button to push.
Comment 15•12 years ago
|
||
I studied the Bionic libc source code and the Linux
sched_setscheduler man page.
By defaults, threads get the SCHED_OTHER (aka SCHED_NORMAL,
value 0) scheduling policy. The threads in this scheduling
policy only have a nice value (dynamic priority) that can
be set with setpriority(2) or nice(2).
To use the pthreads priority, the threads need to have the
SCHED_FIFO or SCHED_RR scheduling policy. The threads in
these two scheduling policies have a static priority.
I guess you can't use SCHED_FIFO and SCHED_RR because you
are trying to lower the priority of the DOM Worker threads,
right?
Note: Linux threads are scheduled in the same way that
processes are scheduled. So although the sched_setscheduler
man page talks only about processes, I believe it applies
to threads as well. See also
http://stackoverflow.com/questions/7684404/is-nice-used-to-change-the-thread-priority-or-the-process-priority
and
http://www.kernel.org/doc/man-pages/online/pages/man7/pthreads.7.html
(search for "nice").
Reporter | ||
Comment 16•12 years ago
|
||
(In reply to Wan-Teh Chang from comment #15)
...snip...
> I guess you can't use SCHED_FIFO and SCHED_RR because you
> are trying to lower the priority of the DOM Worker threads,
> right?
Right. Those both set the process to use the realtime priorities, and the lowest realtime priority is higher than the highest normal priority.
> Note: Linux threads are scheduled in the same way that
> processes are scheduled. So although the sched_setscheduler
> man page talks only about processes, I believe it applies
> to threads as well. See also
> http://stackoverflow.com/questions/7684404/is-nice-used-to-change-the-thread-
> priority-or-the-process-priority
>
> and
>
> http://www.kernel.org/doc/man-pages/online/pages/man7/pthreads.7.html
>
> (search for "nice").
Yeah - I think we need to verify that we can, in fact, use setpriority/nice for threads. If they do work, we can either modify PR_SetThreadPriority to use those instead, or fix the calling code to call them.
Assignee | ||
Comment 17•12 years ago
|
||
(In reply to Wan-Teh Chang from comment #15)
> I studied the Bionic libc source code and the Linux
> sched_setscheduler man page.
>
> [snip]
>
> Note: Linux threads are scheduled in the same way that
> processes are scheduled. So although the sched_setscheduler
> man page talks only about processes, I believe it applies
> to threads as well. See also
> http://stackoverflow.com/questions/7684404/is-nice-used-to-change-the-thread-
> priority-or-the-process-priority
>
> and
>
> http://www.kernel.org/doc/man-pages/online/pages/man7/pthreads.7.html
>
> (search for "nice").
Thanks for investigating! As Dave mentioned I'll try changing the nice value for threads today and if it works I'll re-implement PR_SetThreadPriority() using the necessary calls.
Assignee | ||
Comment 18•12 years ago
|
||
After reading more documentation I'm under the impression that we should be able to re-nice a thread with the following combination: using gettid() to obtain the thread ID and then using this value when calling setpriority(PRIO_PROCESS, <thread id>, <nice value>). This *should* allow us to change the nice value of a single thread; I'll try to verify if it works first locally and then on the device and if it does I'll cook up a patch using this combination in PR_SetThreadPriority().
Comment 19•12 years ago
|
||
Gabriele: yes. We should be able to set the nice value of the current
thread. We should be able to pass 0 or some symbolic constant as the
second argument to setpriority() to indicate the current thread/process.
It should not be necessary to call gettid(). Please try both.
As for setting the nice value of an arbitrary thread, unless there is
a non-standard pthread function that converts a pthread_t to a pid_t,
we won't be able to do that. This limitation is not a problem for your
needs.
So you can add a check in PR_SetThreadPriority to compare the PRThread*
argument with PR_GetCurrentThread(), and only call setpriority() if
they are equal.
Also, NSPR has four priority levels: LOW, NORMAL, HIGH, and URGENT.
If it requires root privilege to set a negative nice value, it is fine
to map NORMAL, HIGH, and URGENT to the nice value 0 and map LOW to an
appropriate positive nice value. We do not need to solve this problem
completely.
Assignee | ||
Comment 20•12 years ago
|
||
(In reply to Wan-Teh Chang from comment #19)
> Gabriele: yes. We should be able to set the nice value of the current
> thread. We should be able to pass 0 or some symbolic constant as the
> second argument to setpriority() to indicate the current thread/process.
> It should not be necessary to call gettid(). Please try both.
I will though I think that in that case the priority of the whole process is affected.
> As for setting the nice value of an arbitrary thread, unless there is
> a non-standard pthread function that converts a pthread_t to a pid_t,
> we won't be able to do that. This limitation is not a problem for your
> needs.
>
> So you can add a check in PR_SetThreadPriority to compare the PRThread*
> argument with PR_GetCurrentThread(), and only call setpriority() if
> they are equal.
I haven't found any way to convert between pthread_t handles and pid_t values however I was experimenting with adding a pid_t field to |struct PRThread| and then setting it with gettid() in _pt_root(). Does that sound like a reasonable solution to you? It shouldn't be too invasive and would give us all the full functionality of PR_SetThreadPriority().
> Also, NSPR has four priority levels: LOW, NORMAL, HIGH, and URGENT.
> If it requires root privilege to set a negative nice value, it is fine
> to map NORMAL, HIGH, and URGENT to the nice value 0 and map LOW to an
> appropriate positive nice value. We do not need to solve this problem
> completely.
From my testing LOW seems to be translated by pt_PriorityMap() into the value 1 which is accepted by the kernel. I'll try with the other values and see if they also work, if they don't I'll make sure they're mapped to 0.
Assignee | ||
Comment 21•12 years ago
|
||
This patch implements PR_SetThreadPriority() by doing the following:
- adding a 'tid' value to each PRThread object obtained via gettid() during thread initialization
- modifying PR_SetThreadPriority to use setpriority() to change the nice value of a thread
- using pt_PriorityMap to map the NSPR priority values to the following nice values:
PR_PRIORITY_LOW 1
PR_PRIORITY_NORMAL 0
PR_PRIORITY_HIGH -1
PR_PRIORITY_URGENT -2
I've verified that the negative nice values can be set on the device and I've also verified that the DOM Worker threads are getting a lower priority by looking them up under /proc/stat.
Attachment #715473 -
Attachment is obsolete: true
Attachment #716112 -
Flags: review?(wtc)
Assignee | ||
Comment 22•12 years ago
|
||
This is the try run for the attached patch:
https://tbpl.mozilla.org/?tree=Try&rev=1c8ce565ce77
Do you think you can review the patch soon? If you're busy I can see if somebody else can do the review; the issue here being that we need to land this before the 28th of February so the sooner the better.
Updated•12 years ago
|
Whiteboard: [target 28/2]
Assignee | ||
Comment 23•12 years ago
|
||
I've had some time to revise the patch and extend it since I noticed that this problem wasn't affecting only Android but also Linux for the very same reason. This patch is more extensive and works on both systems by carefully checking what's available to set the nice value. In addition to this it also properly set the priority of new threads at creation time.
This patch still has an issue however in that PR_SetThreadPriority() is racy due to _pt_root() setting the |tid| value when starting up. If the calling thread quickly calls PR_SetThreadPriority() on a newly created thread it might see a (harmless) value of 0 in |tid|. A quick fix could be done by marking tid volatile and spinning over it until it gets set, a more thorough one would be adding another mutex & condition variable to pt_book used to serialize access to |tid| but that seems overkill to me for an essentially harmless condition.
Attachment #716112 -
Attachment is obsolete: true
Attachment #716112 -
Flags: review?(wtc)
Attachment #717885 -
Flags: review?(wtc)
Updated•12 years ago
|
Summary: PR_SetThreadPriority doesn't do anything when using Bionic → [OTA] PR_SetThreadPriority doesn't do anything when using Bionic
Comment 24•12 years ago
|
||
Comment on attachment 717885 [details] [diff] [review]
[PATCH v3] Implement PR_SetThreadPriority() on Linux-based platforms using per-thread nice values
Review of attachment 717885 [details] [diff] [review]:
-----------------------------------------------------------------
Gabriele: thanks a lot for writing the patch. I was
sick since last Wednesday. I should have told you
who else you can ask for review. (I didn't expect
the cold to be so serious.) Sorry! In the future
you can ask ted or bsmith to review NSPR patches.
I will review the patch tomorrow.
I noticed the problem of setting the tid field.
I remember I used a lock that you described before,
but apparently that code is gone. If you only need
to set the priority of the current thread, we can
also avoid this problem.
Assignee | ||
Comment 25•12 years ago
|
||
(In reply to Wan-Teh Chang from comment #24)
> Gabriele: thanks a lot for writing the patch. I was
> sick since last Wednesday. I should have told you
> who else you can ask for review. (I didn't expect
> the cold to be so serious.) Sorry! In the future
> you can ask ted or bsmith to review NSPR patches.
> I will review the patch tomorrow.
Thank you for doing this and thanks for the heads up to the other reviewers. I can relate to you because a cold knocked me out starting Friday morning and lasting until Monday :)
> I noticed the problem of setting the tid field.
> I remember I used a lock that you described before,
> but apparently that code is gone. If you only need
> to set the priority of the current thread, we can
> also avoid this problem.
Unfortunately not, what the code does is create a new thread and then set its priority so the change always happens on another thread. I'll revise the patch and use the mutex & condition variable in |pt_book| to remove the race.
Assignee | ||
Comment 26•12 years ago
|
||
Revised non-racy patch, uses |pt_book.lock| and |pt_book.cv| to synchronize access to the |tid| field when a new thread starts up. Also initializes the main thread's |tid| field in _PR_InitThreads(). This fixes three distinct issues with the previous patch:
- The main thread not being able to call PR_SetThreadPriority() on itself as it's |tid| value was never initialized
- A thread quickly calling PR_SetThreadPriority() on a newly created thread could see a |tid| value of 0 due to a race-condition
- The corner-case where a thread would be created with a non-default priority argument and then its priority changed quickly thereafter. This was also a potential race between _pt_root() and PR_SetThreadPriority() though unlikely in practice because this pattern is never used in the codebase
I've manually checked that the nice value for the DOM Worker threads is being set correctly both on an Otoro device with B2G and on my Linux desktop build.
Attachment #717885 -
Attachment is obsolete: true
Attachment #717885 -
Flags: review?(wtc)
Attachment #718367 -
Flags: review?(wtc)
Assignee | ||
Comment 27•12 years ago
|
||
I mistakenly attached a patch that had a formatting issue, this is the correct one.
Attachment #718367 -
Attachment is obsolete: true
Attachment #718367 -
Flags: review?(wtc)
Attachment #718401 -
Flags: review?(wtc)
Updated•12 years ago
|
status-b2g18:
--- → affected
status-b2g18-v1.0.1:
--- → affected
Comment 28•12 years ago
|
||
Comment on attachment 718401 [details] [diff] [review]
[PATCH v4] Implement PR_SetThreadPriority() on Linux-based platforms using per-thread nice values
Review of attachment 718401 [details] [diff] [review]:
-----------------------------------------------------------------
I spent some time reading the code here to familiarize myself with it. After reading that, this patch looks good. I was a little concerned about the reuse of the CondVar, but it looks like this won't cause any problems. If you can fix my few comments I think we can land this. Wan-Teh may have additional comments, but I think we can fix those after the fact, since it sounds like he's okay with the approach here.
::: nsprpub/pr/include/private/primpl.h
@@ +192,5 @@
> + * instead which requires us to track kernel thread IDs for each POSIX
> + * thread we create.
> + */
> +#if HAVE_SYS_SYSCALL_H
> +#include <sys/syscall.h>
It looks like this file keeps all its #includes up top, could you move this up to match?
::: nsprpub/pr/src/pthreads/ptthread.c
@@ +28,5 @@
> #undef _POSIX_THREAD_PRIORITY_SCHEDULING
> #endif
>
> +#if defined(_PR_NICE_PRIORITY_SCHEDULING)
> +#undef _POSIX_THREAD_PRIORITY_SCHEDULING
It would be interesting to know if the _POSIX_THREAD_PRIORITY_SCHEDULING codepaths actually do something useful on other POSIX platforms, since they clearly don't work for our purposes on Linux.
@@ +134,5 @@
> +#if HAVE_SYSCALL && defined(SYS_gettid)
> + thred->tid = syscall(SYS_gettid);
> +#elif HAVE_GETTID
> + thred->tid = gettid();
> +#endif
Do we not have gettid in Bionic? (Is that why you made two paths here?) Could we just use the syscall path in both cases?
@@ +914,5 @@
> +#if defined(_PR_NICE_PRIORITY_SCHEDULING)
> +#if HAVE_SYSCALL && defined(SYS_gettid)
> + thred->tid = syscall(SYS_gettid);
> +#elif HAVE_GETTID
> + thred->tid = gettid();
Same here as above.
Comment 29•12 years ago
|
||
Comment on attachment 718401 [details] [diff] [review]
[PATCH v4] Implement PR_SetThreadPriority() on Linux-based platforms using per-thread nice values
Review of attachment 718401 [details] [diff] [review]:
-----------------------------------------------------------------
r=wtc. I am giving you my review+ in advance because the changes I suggest
are very minor.
Ted: NSPR source code is being migrated from CVS to Hg. So this may need
to be checked in to the copy of nsprpub in mozilla-central as a private patch.
Please create a nsprpub/patches directory and document the patch there.
Use security/nss/patches as a model. Thanks.
Gabriele: PR_CreateThread allows us to specify the priority of the new thread.
So it is possible to avoid the locking of pt_book.ml if we change Mozilla/XPCOM
at the same time to have a variable of nsIThreadManager::NewThread that takes
a 'priority' argument. (Changes to nsThread.cpp may also be necessary.)
I am a little worried that the effect of the pt_book.ml locking may be
measurable in some benchmark. I wonder if we can call the Bionic function
__pthread_gettid().
::: nsprpub/configure.in
@@ +2567,4 @@
> dnl AC_HEADER_DIRENT
> dnl AC_HEADER_STDC
> dnl AC_HEADER_SYS_WAIT
> dnl AC_CHECK_HEADERS(fcntl.h limits.h sys/file.h sys/ioctl.h sys/time.h unistd.h)
Can you add the AC_CHECK_HEADERS(sys/syscall.h) after this line?
BUT, I think it is fine to omit this header check and just rely on the
check of the syscall function. We can assume that if syscall() exists,
then <sys/syscall.h> also exists. You made a similar assumption that
if setpriority() exists, then <sys/resource.h> exists.
@@ +2589,5 @@
> dnl ========================================================
> AC_PROG_GCC_TRADITIONAL
> _SAVE_LIBS="$LIBS"
> LIBS="$LIBS $OS_LIBS"
> +AC_CHECK_FUNCS(gettid lchown setpriority strerror syscall dladdr)
Let's list the functions in alphabetical order (move dladdr to the
beginning of the list).
::: nsprpub/pr/include/private/primpl.h
@@ +191,5 @@
> + * real-time threads. On those platforms we set thread's nice values
> + * instead which requires us to track kernel thread IDs for each POSIX
> + * thread we create.
> + */
> +#if HAVE_SYS_SYSCALL_H
Use #ifdef
@@ +195,5 @@
> +#if HAVE_SYS_SYSCALL_H
> +#include <sys/syscall.h>
> +#endif
> +
> +#if ((HAVE_SYSCALL && defined(SYS_gettid)) || HAVE_GETTID) && HAVE_SETPRIORITY
Use defined(HAVE_SYSCALL), defined(HAVE_GETTID), defined(HAVE_SETPRIORITY).
::: nsprpub/pr/src/pthreads/ptthread.c
@@ +130,5 @@
> + /*
> + * We need to know the kernel thread ID of each thread in order to
> + * set its priority hence we do it here instead of at creation time.
> + */
> +#if HAVE_SYSCALL && defined(SYS_gettid)
Use defined(HAVE_SYSCALL). Make the same change elsewhere.
@@ +134,5 @@
> +#if HAVE_SYSCALL && defined(SYS_gettid)
> + thred->tid = syscall(SYS_gettid);
> +#elif HAVE_GETTID
> + thred->tid = gettid();
> +#endif
I also suggest we just call syscall(SYS_gettid).
@@ +136,5 @@
> +#elif HAVE_GETTID
> + thred->tid = gettid();
> +#endif
> +
> + rv = setpriority(PRIO_PROCESS, thred->tid, pt_PriorityMap(thred->priority));
Call setpriority() after PR_Unlock(). This minimizes the critical section.
@@ +684,5 @@
> + PR_Unlock(pt_book.ml);
> +
> + rv = setpriority(PRIO_PROCESS, thred->tid, pt_PriorityMap(newPri));
> +
> + if (EPERM == rv)
BUG: this should be rv == -1 && errno == EPERM.
This seems to imply you didn't observe the EPERM failure in your testing.
Attachment #718401 -
Flags: review?(wtc) → review+
Comment 30•12 years ago
|
||
Comment on attachment 718401 [details] [diff] [review]
[PATCH v4] Implement PR_SetThreadPriority() on Linux-based platforms using per-thread nice values
Review of attachment 718401 [details] [diff] [review]:
-----------------------------------------------------------------
::: nsprpub/pr/src/pthreads/ptthread.c
@@ +136,5 @@
> +#elif HAVE_GETTID
> + thred->tid = gettid();
> +#endif
> +
> + rv = setpriority(PRIO_PROCESS, thred->tid, pt_PriorityMap(thred->priority));
On second thought, I think it is safer to call setpriority before
the PR_Lock() call. This way this setpriority() call won't interfere
with a PR_SetThreadPriority() call made by another thread.
Comment 31•12 years ago
|
||
Comment on attachment 718401 [details] [diff] [review]
[PATCH v4] Implement PR_SetThreadPriority() on Linux-based platforms using per-thread nice values
Review of attachment 718401 [details] [diff] [review]:
-----------------------------------------------------------------
::: nsprpub/pr/src/pthreads/ptthread.c
@@ +914,5 @@
> +#if defined(_PR_NICE_PRIORITY_SCHEDULING)
> +#if HAVE_SYSCALL && defined(SYS_gettid)
> + thred->tid = syscall(SYS_gettid);
> +#elif HAVE_GETTID
> + thred->tid = gettid();
Please add the same code to pt_AttachThread(), which is used to give a
PRThread structure to a thread not created by NSPR.
Comment 32•12 years ago
|
||
Comment on attachment 718401 [details] [diff] [review]
[PATCH v4] Implement PR_SetThreadPriority() on Linux-based platforms using per-thread nice values
Review of attachment 718401 [details] [diff] [review]:
-----------------------------------------------------------------
::: nsprpub/pr/include/private/primpl.h
@@ +195,5 @@
> +#if HAVE_SYS_SYSCALL_H
> +#include <sys/syscall.h>
> +#endif
> +
> +#if ((HAVE_SYSCALL && defined(SYS_gettid)) || HAVE_GETTID) && HAVE_SETPRIORITY
We should add
defined(LINUX) &&
because this should only be used for Linux and its
derivatives. This is a property of the Linux kernel.
Assignee | ||
Comment 33•12 years ago
|
||
Thanks both for your comments/review, I'll revise the patch ASAP:
(In reply to Ted Mielczarek [:ted.mielczarek] from comment #28)
> ::: nsprpub/pr/include/private/primpl.h
> @@ +192,5 @@
> > + * instead which requires us to track kernel thread IDs for each POSIX
> > + * thread we create.
> > + */
> > +#if HAVE_SYS_SYSCALL_H
> > +#include <sys/syscall.h>
>
> It looks like this file keeps all its #includes up top, could you move this
> up to match?
Sure, I will also remove the HAVE_SYS_SYSCALL_H check as suggested by Wan-Teh.
> It would be interesting to know if the _POSIX_THREAD_PRIORITY_SCHEDULING
> codepaths actually do something useful on other POSIX platforms, since they
> clearly don't work for our purposes on Linux.
A look at FreeBSD's documentation indicates that the _POSIX_THREAD_PRIORITY_SCHEDULING paths will work as expected there, NetBSD also seems to honor calls to pthread_setschedparam() when using SCHED_OTHER. It seems to me that in general a way to detect if the priority change has any effect at all is checking the returned values of sched_get_priority_min/max calls, on Linux they both return 0 while on the other platforms I tested they return an actual range (e.g. 0-63 on NetBSD).
> @@ +134,5 @@
> > +#if HAVE_SYSCALL && defined(SYS_gettid)
> > + thred->tid = syscall(SYS_gettid);
> > +#elif HAVE_GETTID
> > + thred->tid = gettid();
> > +#endif
>
> Do we not have gettid in Bionic? (Is that why you made two paths here?)
> Could we just use the syscall path in both cases?
Bionic has a gettid() prototype but does not define the SYS_gettid macro; glibc on my box (Fedora 17) on the other hand defines SYS_gettid but does not offer a gettid() prototype. To ensure maximum coverage I added both code paths.
(In reply to Wan-Teh Chang from comment #29)
> PR_CreateThread allows us to specify the priority of the new thread.
> So it is possible to avoid the locking of pt_book.ml if we change
> Mozilla/XPCOM
> at the same time to have a variable of nsIThreadManager::NewThread that takes
> a 'priority' argument. (Changes to nsThread.cpp may also be necessary.)
Yes, I noticed that the functionality in nsIThreadManager does not expose the capability of PRThreads to have their priority set a startup. I can file a follow-bug to add that bit and then change all the places in the code where we create a thread and then separately change its priority. Still as long as we have PR_SetThreadPriority() around we need to have the locking when creating a new thread to prevent races between it and _pt_root().
> I am a little worried that the effect of the pt_book.ml locking may be
> measurable in some benchmark. I wonder if we can call the Bionic function
> __pthread_gettid().
__pthread_gettid() does not seem to be exposed publicly anywhere as it sits under libc/private/bionic_pthread.h, besides using it wouldn't solve our problem on glibc-based hosts which do not have it.
> BUT, I think it is fine to omit this header check and just rely on the
> check of the syscall function. We can assume that if syscall() exists,
> then <sys/syscall.h> also exists. You made a similar assumption that
> if setpriority() exists, then <sys/resource.h> exists.
Yes, I think that's reasonable, I'll remove the check.
> @@ +684,5 @@
> > + PR_Unlock(pt_book.ml);
> > +
> > + rv = setpriority(PRIO_PROCESS, thred->tid, pt_PriorityMap(newPri));
> > +
> > + if (EPERM == rv)
>
> BUG: this should be rv == -1 && errno == EPERM.
>
> This seems to imply you didn't observe the EPERM failure in your testing.
Nice catch, I'll fix it. I'll retest just in case but since I was printing out everything and looking at the output and I always got rv == 0 I think I should have never hit EPERM.
(In reply to Wan-Teh Chang from comment #30)
> On second thought, I think it is safer to call setpriority before
> the PR_Lock() call. This way this setpriority() call won't interfere
> with a PR_SetThreadPriority() call made by another thread.
You mean doing:
thread->tid = gettid();
setpriority(tid, ...);
PR_Lock();
PR_NotifyAllCondVar(pt_book.cv);
PR_Unlock();
I think this would still be racy as the |tid| value could become visible to another thread before setpriority() has been called. Besides it seems to me that the overhead here is that we will now invoke setpriority() twice on a thread every time we create one and then call PR_SetThreadPriority() on it; addressing that in a follow-up patch we mentioned above should be easy to fix though.
Assignee | ||
Comment 34•12 years ago
|
||
Here's the revised patch, I've address all the issues pointed out in the comments besides rearranging the locking in _pt_root() because it's not clear to me that it would avoid all potential race conditions. Additionally I factored out gettid() so that if the function is not available I will define it on my own to syscall(SYS_gettid). This makes the code that uses it less verbose.
Attachment #718401 -
Attachment is obsolete: true
Attachment #718937 -
Flags: review?(wtc)
Assignee | ||
Comment 35•12 years ago
|
||
This is the try-run for the latest version of the patch:
https://tbpl.mozilla.org/?tree=Try&rev=773030d419d3
Comment 36•12 years ago
|
||
wtc, if you're going to land patches in Mercurial, please set the author to be who actually wrote the patch. Thankfully, we don't live in CVS-land anymore. :)
https://hg.mozilla.org/integration/mozilla-inbound/rev/ae360bf4bc47
Comment 37•12 years ago
|
||
Gabriele: I made three changes to your patch v5.
1. In the PRThread structure, I put the new 'tid' member immediately
after the 'id' member because they are closely related.
2. I removed "undef gettid" before the definition of the gettid macro.
By the way, I think it is fine to just call syscall(SYS_gettid).
3. In _pt_root, I moved the gettid() and the setpriority() calls outside
the critical region. Inside the critical region I only set thred->tid
member, because that member is checked inside the critical region in
PR_SetThreadPriority.
I've pushed this patch to mozilla-inbound. The changeset also includes
the creation of the nsprpub/patches directory, with a README file and
this patch. This setup will be useful when we need to apply private
NSPR patches in the future.
reed: how do I set the author? I know just enough Hg to get my job done.
Thanks.
Attachment #718937 -
Attachment is obsolete: true
Attachment #718937 -
Flags: review?(wtc)
Comment 38•12 years ago
|
||
(In reply to Gabriele Svelto [:gsvelto] from comment #33)
>
> A look at FreeBSD's documentation indicates that the
> _POSIX_THREAD_PRIORITY_SCHEDULING paths will work as expected there, NetBSD
> also seems to honor calls to pthread_setschedparam() when using SCHED_OTHER.
Thanks for the research. I agree that the problem we are solving here
seems specific to the Linux kernel and we cannot conclude that other
platforms would have the same problem automatically.
> Bionic has a gettid() prototype but does not define the SYS_gettid macro;
> glibc on my box (Fedora 17) on the other hand defines SYS_gettid but does
> not offer a gettid() prototype. To ensure maximum coverage I added both code
> paths.
I'm sorry I missed this comment. This means we can't simply call
syscall(SYS_gettid). It would be nice to add this comment to your
patch (where you define the gettid macro).
Comment 39•12 years ago
|
||
(In reply to Wan-Teh Chang from comment #37)
> reed: how do I set the author? I know just enough Hg to get my job done.
> Thanks.
https://developer.mozilla.org/en-US/docs/Mercurial_FAQ#How_can_I_generate_a_patch_for_somebody_else_to_check-in_for_me.3F
https://developer.mozilla.org/en-US/docs/Mercurial_FAQ#Commit_to_the_local_repository
Hope that helps. :)
Assignee | ||
Comment 40•12 years ago
|
||
This is the patch against mozilla-b2g18, asking for approval to have it landed there too.
NOTE: Please see https://wiki.mozilla.org/Release_Management/B2G_Landing to better understand the B2G approval process and landings.
[Approval Request Comment]
Bug caused by (feature/regressing bug #): 841962, 841041
User impact if declined: Activities using worker threads can make the phone completely unresponsive.
Testing completed: Tested on Otoro and Unagi devices as well as a desktop build
Risk to taking this patch (and alternatives if risky): The only potential problems are build-related ones but the extensive checks and alternate paths should take care of them. At runtime no ill side-effects can happen as in the worst case the thread priority adjustments will be ignored.
String or UUID changes made by this patch: None
Attachment #719218 -
Flags: approval-mozilla-b2g18?
Comment 41•12 years ago
|
||
Status: ASSIGNED → RESOLVED
Closed: 12 years ago
Resolution: --- → FIXED
Comment 42•12 years ago
|
||
Comment on attachment 719218 [details] [diff] [review]
[PATCH v6 mozilla-b2g18] Implement PR_SetThreadPriority() on Linux-based platforms using per-thread nice values
tef+ bugs have automatic uplift approval
Attachment #719218 -
Flags: approval-mozilla-b2g18?
Comment 43•12 years ago
|
||
https://hg.mozilla.org/releases/mozilla-b2g18/rev/82b414f81f1b
https://hg.mozilla.org/releases/mozilla-b2g18_v1_0_1/rev/2b9203badf30
status-b2g18-v1.0.0:
--- → wontfix
status-firefox20:
--- → wontfix
status-firefox21:
--- → wontfix
status-firefox22:
--- → fixed
Comment 44•12 years ago
|
||
(In reply to Ryan VanderMeulen [:RyanVM] from comment #43)
> https://hg.mozilla.org/releases/mozilla-b2g18/rev/82b414f81f1b
> https://hg.mozilla.org/releases/mozilla-b2g18_v1_0_1/rev/2b9203badf30
IMPORTANT WARNING:
You should have used my mozilla-inbound patchset. I think this is my
fault because the patch v6 I attached omitted some changes. Sorry
about the confusion.
A dummy prdepend.h change is highly recommended because NSPR
makefiles don't have proper header dependencies. This patch
adds a new member to the PRThread structure, so if another .c
file (other than ptthread.c) also uses a member of PRThread
but is not recompiled, it will use the old offset of the member
and things will go wrong. A fresh rebuild won't have this
problem.
I also recommend uplifting the new nsprpub/patches/README
and nsprpub/patches/linux-setpriority.patch files.
Comment 46•12 years ago
|
||
I'll close this bug after I have checked in the patch to the NSPR
repository. The NSPR repository is closed to checkins right now
because it is being migrated from CVS to Hg.
Status: REOPENED → ASSIGNED
Priority: -- → P1
Whiteboard: [target 28/2] → [target 28/2][leave open]
Target Milestone: --- → 4.9.6
Version: 4.9.6 → other
Comment 47•12 years ago
|
||
Comment 48•12 years ago
|
||
And yes, when there's a non-obsolete branch-specific patch attached to a bug, I'm naturally going to assume that it's what needs to be landed.
Assignee | ||
Comment 49•12 years ago
|
||
(In reply to Wan-Teh Chang from comment #44)
> You should have used my mozilla-inbound patchset. I think this is my
> fault because the patch v6 I attached omitted some changes. Sorry
> about the confusion.
My fault, I had checked your commit and noticed the change to prdepend.h but then forgot to add it back to the new patch. That'll teach me not to prepare patches after midnight.
Comment 50•12 years ago
|
||
Wan-Teh, this is causing bug 689256 to occur on Linux debug with very high frequency. Normally, that would be cause for automatic backout. However, this is complicated by the fact that this is a tef+ bug that's already been uplifted to the other branches. Sorry for taking so long to figure this out, but it's taken awhile to go back and bisect with the tree in as bad of shape as it was last week :-(
Anyway, but 689256 needs immediate attention or I will have to back this out on trunk. Please look into it ASAP.
Assignee | ||
Comment 51•12 years ago
|
||
(In reply to Ryan VanderMeulen [:RyanVM] from comment #50)
> Anyway, but 689256 needs immediate attention or I will have to back this out
> on trunk. Please look into it ASAP.
I'll try to reproduce it tomorrow and see if I can fix it because we need this patch on FxOS, otherwise certain activities can make the phones completely unresponsive. Let's follow up directly in bug 689256.
Bug 841651 tracks the failure of browser_522375.js, a test that seems to rely on timing somehow... That's obviously a bad idea. This patch could have changed some of our timing a little bit, so I am not too surprised that at least one test fails now. I'd say we should disable browser_522375.js rather than back this out, though.
Comment 53•12 years ago
|
||
RyanVM: thank you for letting us know. Gabriele is already
making good progress on tracking down bug 689256. I'll
leave that to his capable hands.
Assignee | ||
Comment 55•12 years ago
|
||
(In reply to Andrew Overholt [:overholt] from comment #54)
> Are things still going alright here, Gabriele?
Yes, the patch already landed on b2g18 and central fixing the problem. The bug is still open only because Wan-Teh still needs to commit it in the NSPR repository when the CVS-to-Mercurial migration will be finished.
Flags: needinfo?(gsvelto)
Comment 56•12 years ago
|
||
Great!
Updated•12 years ago
|
Whiteboard: [target 28/2][leave open] → [target 28/2][leave open][NPOTB]
Comment 57•12 years ago
|
||
Comment on attachment 719156 [details] [diff] [review]
[PATCH v6] Implement PR_SetThreadPriority() on Linux-based platforms using per-thread nice values, by Gabriele Svelto
Patch pushed to the new NSPR hg repository, with Gabriele correctly
listed as the author this time :-)
http://hg.mozilla.org/projects/nspr/rev/39cb7968bf9b
Patch checked in on the NSPR CVS trunk (NSPR 4.9.6).
Checking in configure;
/cvsroot/mozilla/nsprpub/configure,v <-- configure
new revision: 1.339; previous revision: 1.338
done
Checking in configure.in;
/cvsroot/mozilla/nsprpub/configure.in,v <-- configure.in
new revision: 1.342; previous revision: 1.341
done
Checking in pr/include/private/primpl.h;
/cvsroot/mozilla/nsprpub/pr/include/private/primpl.h,v <-- primpl.h
new revision: 3.97; previous revision: 3.96
done
Checking in pr/src/pthreads/ptthread.c;
/cvsroot/mozilla/nsprpub/pr/src/pthreads/ptthread.c,v <-- ptthread.c
new revision: 3.102; previous revision: 3.101
done
Attachment #719156 -
Flags: checked-in+
Updated•12 years ago
|
Status: ASSIGNED → RESOLVED
Closed: 12 years ago → 12 years ago
Resolution: --- → FIXED
Whiteboard: [target 28/2][leave open][NPOTB] → [target 28/2][NPOTB]
Comment 58•12 years ago
|
||
Comment 59•12 years ago
|
||
I think this should be backed out. I've filed bug 861434 explaining why. I'd like to file a separate bug for re-implementing the functionality desired in this bug.
Updated•12 years ago
|
Whiteboard: [target 28/2][NPOTB] → [target 28/2]
You need to log in
before you can comment on or make changes to this bug.
Description
•