Closed Bug 757593 Opened 13 years ago Closed 11 years ago

PR_IntervalNow not monotonic for android / linux / Mac OS X

Categories

(NSPR :: NSPR, defect, P1)

x86_64
Linux
defect

Tracking

(blocking-basecamp:+, firefox18 fixed, firefox19 fixed, firefox20 fixed)

RESOLVED FIXED
blocking-basecamp +
Tracking Status
firefox18 --- fixed
firefox19 --- fixed
firefox20 --- fixed

People

(Reporter: mcmanus, Assigned: wtc)

References

(Blocks 1 open bug)

Details

Attachments

(7 files, 3 obsolete files)

PR_IntervalNow() on linux (confirmed) and I suspect os x and android is not monotonic.. as it says right in the comments.

http://mxr.mozilla.org/mozilla-central/source/nsprpub/pr/src/misc/prinrval.c#42

Given that function is used a lot in the firefox code base, and pr_intervals don't work right without that property, we should try and fix that up :)

I know in modern linux the kernel actually works pretty hard at keeping gtod (which is what pr_interval uses) monotonic for normal uese.. you won't see a problem due to any normal version of smp or ntp or any common drift issue like that. but if someone sets the clock in a different application then it all goes up in flames because gtod is wall clock time.

on linux and android we can do clock_gettime(CLOCK_MONOTONIC) that will do what we want. I don't know if anything can be done for os x - but it shouldn't stop us from fixing up android as a very impt platform.
Thank you for the bug report.  This is closely related to bug 752783.
We can fix both bugs in one shot.
Version: 4.10 → other
Patrick, don't you have a patch for this? Do you think that patch has to be redone or is it something that may just need a review?
(In reply to Brian Smith (:bsmith) from comment #2)
> Patrick, don't you have a patch for this? Do you think that patch has to be
> redone or is it something that may just need a review?

my patch was just for testing to see if we could produce a non monotonic result.. it just subbed in clock_gettime for gettimeofday and converted it to ms.

a real patch would need to check the os, etc.. it sounds like nspr would also like to work with better resolution.
mcmanus: please attach your patch for my reference.  Thanks.

I am working on a patch.  I just need to clean it up and
do some testing and measurement.
(In reply to Wan-Teh Chang from comment #4)
> mcmanus: please attach your patch for my reference.  Thanks.
> 
> I am working on a patch.  I just need to clean it up and
> do some testing and measurement.

https://bugzilla.mozilla.org/show_bug.cgi?id=749890#c67
mcmanus: thank you.  Your patch is essentially the same as mine.

As requested in bug 752783, I increased the resolution of PRIntervalTime
to the maximum allowed by NSPR: 10 nanoseconds.  The current resolution
of PRIntervalTime is 1 millisecond, so that's a 100-fold improvement.

clock_getres(CLOCK_MONOTONIC) reports a resolution of 1 nanosecond on my
Ubuntu 10.04.2 LTS computer.  My testing confirms this resolution.  In
contrast, my testing shows the resolution of gettimeofday() is 2-3
microseconds on the same computer.

The overhead of clock_gettime(CLOCK_MONOTONIC) is higher than the overhead
of gettimeofday(): 219 nanoseconds vs. 87 nanoseconds, as measured by the
NSPR 'inrval' test program:
http://mxr.mozilla.org/nspr/source/nsprpub/pr/tests/inrval.c
wtc, is Patrick's patch really the same as yours, besides the 10 nanosecond resolution? If so, we can post a modified version of it for you to r+. Otherwise, could you post your patch for somebody to r+?

It seems like people at Mozilla are willing to accept the 219 nanoseconds vs. 87 nanoseconds overhead difference. Also, in NSS, I did not see any use of PR_IntervalNow that seemed like it would be particularly sensitive to this difference, as far as poor perforance is concerned.
blocking-basecamp: --- → ?
I marked this tracking-firefox18 to notify release-drivers that we want to fix this bug for B2G v1 which apparently means fixing it for Firefox Aurora too.
Blocks: 803846
Attached patch Patch v1 (obsolete) (deleted) — Splinter Review
Please review. The following files are the most important in this patch.

1. mozilla/nsprpub/pr/include/md/_darwin.h: an example of a platform that
uses the current gettimeofday (GTOD) based implementation. For Mac OS X
and iOS, this needs to be replaced, too.

2. mozilla/nsprpub/pr/include/md/_linux.h: an example of a platform that
will use the new clock_gettime(CLOCK_MONOTONIC) based implementation.

3. mozilla/nsprpub/pr/include/md/_unixos.h: the common function declarations
and macro definitions for the gettimeofday based implementation are moved
here to avoid code duplication.

4. mozilla/nsprpub/pr/src/Makefile.in: Does Android have -lrt, too?

5. mozilla/nsprpub/pr/src/md/unix/unix.c
Attachment #674038 - Flags: review?(bsmith)
Status: NEW → ASSIGNED
Priority: -- → P1
Target Milestone: --- → 4.9.4
Attached patch Patch v2 (obsolete) (deleted) — Splinter Review
Add a better implementation for iOS and Mac OS X.

Move the comment for PR_IntervalNow() that Patrick pointed out to
the gettimeofday based implementation. Not all implementations of
PR_IntervalNow() are based on "the time of day capability offered
by the system".
Attachment #674038 - Attachment is obsolete: true
Attachment #674038 - Flags: review?(bsmith)
Attachment #674065 - Flags: review?(bsmith)
Comment on attachment 674065 [details] [diff] [review]
Patch v2

Review of attachment 674065 [details] [diff] [review]:
-----------------------------------------------------------------

I verified that this implementation seems to match the other implementations we have (mozilla::TimeStamp and chromium/src/base/time_posix.cc). 

Some Linux kernels (2.6.28+) support CLOCK_MONOTONIC_RAW. The difference is that CLOCK_MONOTONIC can be "slewed" (slightly skewed in a monotonic fashion) by NTP, whereas CLOCK_MONOTONIC_RAW is never affected by NTP. Therefore, it seems better to use CLOCK_MONOTONIC_RAW when it is available, in the instances where we are using PR_IntervalNow() for performance measurements. But, this would require a runtime check of the kernel version. Perhaps it would be better to make that improvement in a follow-up bug.

FYI: Some Linux kernels before 2.6.32.19 have a bug where even CLOCK_MONOTONIC is not actually monotonic:

    commit 0696b711e4be45fa104c12329f617beb29c03f78
    Author: Lin Ming <ming.m.lin@intel.com>
    Date:   Tue Nov 17 13:49:50 2009 +0800

    timekeeping: Fix clock_gettime vsyscall time warp

    Since commit 0a544198 "timekeeping: Move NTP adjusted clock
    multiplier to struct timekeeper" the clock multiplier of vsyscall is updated with
    the unmodified clock multiplier of the clock source and not with the
    NTP adjusted multiplier of the timekeeper.

    This causes user space observerable time warps:
    new CLOCK-warp maximum: 120 nsecs,  00000025c337c537 -> 00000025c337c4bf

B2G is using a 3.2+ kernel so it should be not affected by this.

::: mozilla/nsprpub/pr/src/md/unix/unix.c
@@ +3019,5 @@
>  
> +#if defined(_MD_INTERVAL_USE_GTOD)
> +/*
> + * This version of interval times is based on the time of day
> + * capability offered by system. This isn't valid for two reasons:

s/by system/by the system/
Attachment #674065 - Flags: review?(bsmith) → review+
I also verified that this builds on all platforms (verifying that Android has -lrt), by applying this patch to mozilla-central and pushing this patch to try:
https://tbpl.mozilla.org/?tree=Try&rev=892f8829a299
Summary: PR_IntervalNow not monotonic for android / linux → PR_IntervalNow not monotonic for android / linux / Mac OS X
(In reply to Brian Smith (:bsmith) from comment #11)
> Some Linux kernels (2.6.28+) support CLOCK_MONOTONIC_RAW.

I tried builds with CLOCK_MONOTONIC_RAW but I found that the Android NDK used by Fennec and B2G do not define CLOCK_MONOTONIC_RAW, so I suggest that if we do add support for CLOCK_MONOTONIC_RAW, we do so in another bug.
I applied this patch to my local copy of mozilla-central and I found that the WebGL tests are consistently failing for Linux and Mac on the try push I did: https://tbpl.mozilla.org/?tree=Try&rev=33160e60dd3c

CC:bz to get a suggestion of who can help, as unfortunately I usually do not use Linux or a Mac so I haven't had a chance to debug this further. 

I see these tests are calling the Javascript DOM function window.setTimeout(20000);

I suspect there might be a problem with PR_IntervalToMilliseconds, because Mozilla's timeout code is calling PR_IntervalToMilliseconds like so (http://mxr.mozilla.org/mozilla-central/source/dom/base/nsGlobalWindow.cpp#9533):

// Two timer deadlines must differ by less than half the PRIntervalTime domain.
#define DELAY_INTERVAL_LIMIT    PR_BIT(8 * sizeof(PRIntervalTime) - 1)

...

// The longest interval (as PRIntervalTime) we permit, or that our
// timer code can handle, really. See DELAY_INTERVAL_LIMIT in
// nsTimerImpl.h for details.
#define DOM_MAX_TIMEOUT_VALUE    DELAY_INTERVAL_LIMIT

...

uint32_t maxTimeoutMs = PR_IntervalToMilliseconds(DOM_MAX_TIMEOUT_VALUE);
if (static_cast<uint32_t>(interval) > maxTimeoutMs) {
  interval = maxTimeoutMs;
}

Another possibility of a problem area is 
http://mxr.mozilla.org/mozilla-central/source/dom/base/nsGlobalWindow.cpp#9831:

  // Make sure to cast the unsigned PR_USEC_PER_MSEC to signed
  // PRTime to make the division do the right thing on 64-bit
  // platforms whether delay is positive or negative (which we
  // know is always positive here, but cast anyways for
  // consistency).
  nsresult rv = aTimeout->InitTimer(TimerCallback, delay.ToMilliseconds());

Notice how the comment talks about explicit casts but there is no cast!
> Notice how the comment talks about explicit casts but there is no cast!

The comment is stale.  There used to be a division there for the second argument back when this stuff was using explicily PRIntervalTime instead of TimeDuration, and that's what the comment is talking about.  But at this point all that stuff is inside TimeDuration::ToMilliseconds, which works with doubles anyway, so is not subject to the integer promotion issues that comment is about.  The comment should just go away.

As for the DELAY_INTERVAL_LIMIT bit, that shouldn't be an obvious problem.  Specifically, in the new setup it looks like PRIntervalTime is measured in units of 1/100000 second.  PRIntervalTime is a 32-bit quantity, so DELAY_INTERVAL_LIMIT there ends up being (1<<31), which is about 2e9.  Dividing by 1e5 gives about 2e4 seconds.  And in particular, maxTimeoutMs should be ending up as somewhere around 2e7 or so....

I guess I can take a look at this and see if anything jumps out at me.  Building now.
OK, I have a (Mac) build with the patch attached to this bug.

setTimeout(function(){}, 20000) seems to work right.

Running the WebGL test, it does indeed seem to time out.

Running https://www.khronos.org/registry/webgl/sdk/tests/conformance/context/context-lost.html directly, it also seems to hang forever without calling testLostContext....
Ah.  So the problem is that there's a WebGL bug here.  In particular, the WebGL code on context loss arms a timer to report the context loss.  It does it like so:

616         mContextRestorer->InitWithCallback(static_cast<nsITimerCallback*>(this),
617                                            PR_MillisecondsToInterval(1000),
618                                            nsITimer::TYPE_ONE_SHOT);

But the second argument of InitWithCallback is a delay, measured in milliseconds.  So with the new 1e-5s tick this is passing in "100s", which is longer than the 20s timeout the test harness is using.

Fixing the above code (in content/canvas/src/WebGLContext.h) should help.
Sorry, I don't understand. If the function PR_MillisecondsToInterval is sane, its argument should be milliseconds (else it should be renamed). But if its argument is 1e-5s ticks, then 1000 still wouldn't be 100s, rather it would be 0.01s.
(In reply to Benoit Jacob [:bjacob] from comment #18)
> Sorry, I don't understand. If the function PR_MillisecondsToInterval is
> sane, its argument should be milliseconds (else it should be renamed). But
> if its argument is 1e-5s ticks, then 1000 still wouldn't be 100s, rather it
> would be 0.01s.

I think the second argument to initwithcallback is an unsigned long of milliseconds, and the code as written is passing in a pr_interval type.
Attached patch it's milliseconds (deleted) — Splinter Review
Attachment #674778 - Flags: review?(bzbarsky)
Comment on attachment 674778 [details] [diff] [review]
it's milliseconds

Yep, exactly.
Attachment #674778 - Flags: review?(bzbarsky) → review+
http://hg.mozilla.org/integration/mozilla-inbound/rev/d400b4e402f3
Whiteboard: [leave open]
Attached patch Patch v3 (deleted) — Splinter Review
I changed the comment as Brian suggested and checked this
in on the NSPR trunk (NSPR 4.9.4).

I am open to using the minimum ticks per second allowed
by NSPR (1000). This way PRIntervalTime (a 32-bit time
stamp) will take as long as possible to wrap around.

Checking in pr/include/md/_aix.h;
/cvsroot/mozilla/nsprpub/pr/include/md/_aix.h,v  <--  _aix.h
new revision: 3.18; previous revision: 3.17
done
Checking in pr/include/md/_bsdi.h;
/cvsroot/mozilla/nsprpub/pr/include/md/_bsdi.h,v  <--  _bsdi.h
new revision: 3.14; previous revision: 3.13
done
Checking in pr/include/md/_darwin.h;
/cvsroot/mozilla/nsprpub/pr/include/md/_darwin.h,v  <--  _darwin.h
new revision: 3.27; previous revision: 3.26
done
Checking in pr/include/md/_dgux.h;
/cvsroot/mozilla/nsprpub/pr/include/md/_dgux.h,v  <--  _dgux.h
new revision: 3.9; previous revision: 3.8
done
Checking in pr/include/md/_freebsd.h;
/cvsroot/mozilla/nsprpub/pr/include/md/_freebsd.h,v  <--  _freebsd.h
new revision: 3.24; previous revision: 3.23
done
Checking in pr/include/md/_hpux.h;
/cvsroot/mozilla/nsprpub/pr/include/md/_hpux.h,v  <--  _hpux.h
new revision: 3.26; previous revision: 3.25
done
Checking in pr/include/md/_linux.h;
/cvsroot/mozilla/nsprpub/pr/include/md/_linux.h,v  <--  _linux.h
new revision: 3.62; previous revision: 3.61
done
Checking in pr/include/md/_netbsd.h;
/cvsroot/mozilla/nsprpub/pr/include/md/_netbsd.h,v  <--  _netbsd.h
new revision: 3.21; previous revision: 3.20
done
Checking in pr/include/md/_nto.h;
/cvsroot/mozilla/nsprpub/pr/include/md/_nto.h,v  <--  _nto.h
new revision: 3.13; previous revision: 3.12
done
Checking in pr/include/md/_openbsd.h;
/cvsroot/mozilla/nsprpub/pr/include/md/_openbsd.h,v  <--  _openbsd.h
new revision: 3.12; previous revision: 3.11
done
Checking in pr/include/md/_osf1.h;
/cvsroot/mozilla/nsprpub/pr/include/md/_osf1.h,v  <--  _osf1.h
new revision: 3.19; previous revision: 3.18
done
Checking in pr/include/md/_qnx.h;
/cvsroot/mozilla/nsprpub/pr/include/md/_qnx.h,v  <--  _qnx.h
new revision: 3.7; previous revision: 3.6
done
Checking in pr/include/md/_riscos.h;
/cvsroot/mozilla/nsprpub/pr/include/md/_riscos.h,v  <--  _riscos.h
new revision: 3.5; previous revision: 3.4
done
Checking in pr/include/md/_scoos.h;
/cvsroot/mozilla/nsprpub/pr/include/md/_scoos.h,v  <--  _scoos.h
new revision: 3.9; previous revision: 3.8
done
Checking in pr/include/md/_symbian.h;
/cvsroot/mozilla/nsprpub/pr/include/md/_symbian.h,v  <--  _symbian.h
new revision: 1.4; previous revision: 1.3
done
Checking in pr/include/md/_unixos.h;
/cvsroot/mozilla/nsprpub/pr/include/md/_unixos.h,v  <--  _unixos.h
new revision: 3.43; previous revision: 3.42
done
Checking in pr/include/md/_unixware.h;
/cvsroot/mozilla/nsprpub/pr/include/md/_unixware.h,v  <--  _unixware.h
new revision: 3.9; previous revision: 3.8
done
Checking in pr/src/Makefile.in;
/cvsroot/mozilla/nsprpub/pr/src/Makefile.in,v  <--  Makefile.in
new revision: 1.63; previous revision: 1.62
done
Checking in pr/src/md/unix/darwin.c;
/cvsroot/mozilla/nsprpub/pr/src/md/unix/darwin.c,v  <--  darwin.c
new revision: 3.13; previous revision: 3.12
done
Checking in pr/src/md/unix/unix.c;
/cvsroot/mozilla/nsprpub/pr/src/md/unix/unix.c,v  <--  unix.c
new revision: 3.61; previous revision: 3.60
done
Checking in pr/src/misc/prinrval.c;
/cvsroot/mozilla/nsprpub/pr/src/misc/prinrval.c,v  <--  prinrval.c
new revision: 3.11; previous revision: 3.10
done
Attachment #674065 - Attachment is obsolete: true
Change to 1000 ticks per second (NSPR minimum), down from 100000
ticks per second (NSPR maximum), so that PRIntervalTime (a 32-bit
time stamp) will take as long as possible to wrap around. See

http://bonsai.mozilla.org/cvsblame.cgi?file=mozilla/nsprpub/pr/include/prinrval.h&rev=3.9&mark=29,39-40#25

Miscellaneous cleanup:

1. Rename HAVE_MONOTONIC_CLOCK to HAVE_CLOCK_MONOTONIC to follow
the naming convention of feature detection macros (the feature is
CLOCK_MONOTONIC).

2. Rename _PR_UNIX_GetInterval2 to _PR_CGT_GetInterval, where "CGT"
stands for "clock_gettime". (I'd appreciate suggestions of better
names.)
Attachment #674887 - Flags: review?(bsmith)
https://hg.mozilla.org/mozilla-central/rev/d400b4e402f3
Could this change have broken the SunOS NSS tinderbox buildnss04?
http://tinderbox.mozilla.org/showbuilds.cgi?tree=NSS
Attached patch Patch for _solaris.h (deleted) — Splinter Review
Kai: you are right. I missed this change. Please review this patch.

I also discovered the PRIntervalTime function declarations and macro
definitions were repeated at the end of _solaris.h. So I removed them.
That part of the file is inside a #if !defined(_PR_PTHREADS) block, so
I also removed an unnecessary #ifndef _PR_PTHREADS test. Thanks.

Note: line 144 is the beginning of the #if !defined(_PR_PTHREADS) block,
and line 494 is the end of the block.

http://bonsai.mozilla.org/cvsblame.cgi?file=mozilla/nsprpub/pr/include/md/_solaris.h&rev=3.31&mark=137,144#137

http://bonsai.mozilla.org/cvsblame.cgi?file=mozilla/nsprpub/pr/include/md/_solaris.h&rev=3.31&mark=494#470 

I've checked in this patch to fix the Solaris build error.

Checking in _solaris.h;
/cvsroot/mozilla/nsprpub/pr/include/md/_solaris.h,v  <--  _solaris.h
new revision: 3.32; previous revision: 3.31
done
Attachment #675122 - Flags: review?(kaie)
Comment on attachment 675122 [details] [diff] [review]
Patch for _solaris.h

If the build goes green, r=kaie
Attachment #675122 - Flags: review?(kaie) → review+
(In reply to Brian Smith (:bsmith) from comment #8)
> I marked this tracking-firefox18 to notify release-drivers that we want to
> fix this bug for B2G v1 which apparently means fixing it for Firefox Aurora
> too.

Hey Brian,would be helpful to know : what is the impact of this bug on desktop/mobile . Anything user-facing ? Would it help with performance ? Would we know of regressions sooner than later, if any if this patch was uplifted ? 

Also waiting on blocking-basecamp : ? to make sure its needed by b2g
Flags: needinfo?(bsmith)
We'll wait on the blocking-basecamp decision and a nomination for uplift before considering this for FF18.
See comment 24 for a description of the changes. I omitted the
renaming of _PR_UNIX_GetInterval2 to _PR_CGT_GetInterval from
the patch because I'm not sure if it's easy to figure out what
"CGT" stands for. I'd still appreciate suggestions of a better
name for _PR_UNIX_GetInterval2.

Patch checked in on the NSPR trunk (NSPR 4.9.4).

Checking in pr/include/md/_linux.h;
/cvsroot/mozilla/nsprpub/pr/include/md/_linux.h,v  <--  _linux.h
new revision: 3.63; previous revision: 3.62
done
Checking in pr/include/md/_unixos.h;
/cvsroot/mozilla/nsprpub/pr/include/md/_unixos.h,v  <--  _unixos.h
new revision: 3.44; previous revision: 3.43
done
Checking in pr/src/md/unix/darwin.c;
/cvsroot/mozilla/nsprpub/pr/src/md/unix/darwin.c,v  <--  darwin.c
new revision: 3.14; previous revision: 3.13
done
Checking in pr/src/md/unix/unix.c;
/cvsroot/mozilla/nsprpub/pr/src/md/unix/unix.c,v  <--  unix.c
new revision: 3.62; previous revision: 3.61
done
Attachment #674887 - Attachment is obsolete: true
Attachment #674887 - Flags: review?(bsmith)
Attachment #680346 - Flags: review?(bsmith)
Pushed to mozilla-inbound as part of NSPR_4_9_4_BETA1:
https://hg.mozilla.org/integration/mozilla-inbound/rev/0d124d4f28b0
https://hg.mozilla.org/mozilla-central/rev/0d124d4f28b0
> We'll wait on the blocking-basecamp decision and a nomination for uplift before considering this for 
> FF18.

When triaging, it would be helpful if you'd check out bug dependencies.  In this case, bug 803846 is a basecamp blocker.

I just bug 803846 here -- see that bug for discussion of why this is a B2G blocker.
blocking-basecamp: ? → +
Clearing needinfo; we've established that this is a basecamp blocker.

Let's get this on Aurora.
Flags: needinfo?(bsmith)
NSPR_4_9_4_RTM pushed to mozilla-inbound. The only change is the
version string:
https://hg.mozilla.org/integration/mozilla-inbound/rev/0ae3e664c2a5

NSPR_4_9_4_BETA2 is in mozilla-aurora:
http://mxr.mozilla.org/mozilla-aurora/source/nsprpub/TAG-INFO
Since the only difference from NSPR_4_9_4_RTM is the version string,
it is NOT necessary to update mozilla-aurora to NSPR_4_9_4_RTM right
away.

If we need to update the NSPR in any other mozilla branch, please use
the NSPR_4_9_4_RTM tag.
Whiteboard: [leave open]
We still need this on beta, right?
Yes, mozilla-beta is still using NSPR_4_9_2_RTM. We need to
push NSPR_4_9_4_RTM to mozilla-beta using the procedure in
https://developer.mozilla.org/en-US/docs/Updating_NSPR_or_NSS_in_mozilla-central

Justin, can you do that? Ideally Ted or Brian should do that,
but I don't know if they are around this week.
I can help to push it now, but I can't watch trees.
Update mozilla beta 18 to NSPR 4.9.4 RTM, thanks to Wan-Teh for creating the release tag.

https://hg.mozilla.org/releases/mozilla-beta/rev/e0d2cead16d0

Can Mozilla drivers please set the right flags? Thanks.
Just a small note, despite client.py having the fix from bug 782784, my commit didn't contain a change to the force-rebuild file prdepend.h

So, just in case you get build failures with this landing, 
please edit nsprpub/config/prdepend.h
and append a blank line, and commit,
that will force a full rebuild.
(In reply to Wan-Teh Chang from comment #37)
> 
> Since the only difference from NSPR_4_9_4_RTM is the version string,
> it is NOT necessary to update mozilla-aurora to NSPR_4_9_4_RTM right
> away.


Well, I have a warning bot, that sends out warnings emails if aurora/beta contain beta versions.

It would be nice to fix the version string on aurora.

If you gave appropriate approvals to this bug, I could push that version number change into Aurora.
https://hg.mozilla.org/mozilla-central/rev/0ae3e664c2a5
Status: ASSIGNED → RESOLVED
Closed: 12 years ago
Resolution: --- → FIXED
Comment on attachment 680346 [details] [diff] [review]
Change to 1000 ticks per second, miscellaneous cleanup, v2

Review of attachment 680346 [details] [diff] [review]:
-----------------------------------------------------------------

Sorry for the belated review of the already-checked-in patch.

::: mozilla/nsprpub/pr/src/md/unix/darwin.c
@@ +35,4 @@
>       */
>      time = mach_absolute_time();
> +    time = time * machTimebaseInfo.numer / machTimebaseInfo.denom /
> +           PR_NSEC_PER_MSEC;

Should we be more concerned about overflow here, e.g. by doing it this way?:

time = time / PR_NSEC_PER_MSEC * machTimebaseInfo.numer / machTimebaseInfo.denom;
Attachment #680346 - Flags: review?(bsmith) → review+
Comment on attachment 680346 [details] [diff] [review]
Change to 1000 ticks per second, miscellaneous cleanup, v2

[Approval Request Comment]

This approval request is not for this patch exactly, but rather just for syncing mozilla-aurora with mozilla-central and mozilla-beta, by updating the version strings. See comment 37: "NSPR_4_9_4_BETA2 is in mozilla-aurora:
http://mxr.mozilla.org/mozilla-aurora/source/nsprpub/TAG-INFO Since the only difference from NSPR_4_9_4_RTM is the version string, it is NOT necessary to update mozilla-aurora to NSPR_4_9_4_RTM right away."

and see Kai's response in comment 43: "Well, I have a warning bot, that sends out warnings emails if aurora/beta contain beta versions. It would be nice to fix the version string on aurora. If you gave appropriate approvals to this bug, I could push that version number change into Aurora."
Attachment #680346 - Flags: approval-mozilla-aurora?
Attachment #680346 - Flags: approval-mozilla-aurora? → approval-mozilla-aurora+
(In reply to Brian Smith (:bsmith) from comment #45)
>
> Should we be more concerned about overflow here, e.g. by doing it this way?:
> 
> time = time / PR_NSEC_PER_MSEC * machTimebaseInfo.numer /
> machTimebaseInfo.denom;

I considered this issue. This is a difficult question because the flip
side is the loss of precision. I chose the current form:

> +    time = time * machTimebaseInfo.numer / machTimebaseInfo.denom /
> +           PR_NSEC_PER_MSEC;

because time = time * machTimebaseInfo.numer / machTimebaseInfo.denom
is apparently how one is supposed to convert the mach time base to
nanoseconds:
http://developer.apple.com/library/mac/#qa/qa1398/_index.html

I assume Apple must have taken care to not break that code. (On current
versions of Mac OS X, both machTimebaseInfo.numer and machTimebaseInfo.denom
are 1.)
(In reply to Brian Smith (:bsmith) from comment #45)
> Comment on attachment 680346 [details] [diff] [review]
>
> Should we be more concerned about overflow here, e.g. by doing it this way?:
> 
> time = time / PR_NSEC_PER_MSEC * machTimebaseInfo.numer /
> machTimebaseInfo.denom;

This test program seems to suggest that dividing by PR_NSEC_PER_MSEC
first could result in a loss of precision for some inputs.

I use 1000 to simulate PR_NSEC_PER_MSEC.
numer=5
demon=8

The expected result is the result computed using 'double',
*truncated* after five digits.

wtc-macbookair:nspr-tip wtc$ gcc foo.c
wtc-macbookair:nspr-tip wtc$ ./a.out
14660
14660
14660
1.466049e+04
61728
61725
61728
6.172840e+04
48617
48615
48618
4.861801e+04
41659
41655
41659
4.165969e+04
27770
27770
27771
2.777111e+04

In the third and fifth test cases, dividing by 1000 last gets
the most accurate results.

So, *if* overflow cannot happen, dividing by PR_NSEC_PER_MSEC last should
be the best.

I found Mac OS X's nanosleep.c source code on the Web. It checks numer=1
and denom=1 as a special case. For other values of numer and denom, it
seems to do 128-bit integer arithmetic to avoid overflow.
This patch works. It avoids the need to convert Mach absolute time
to nanoseconds. But it is 30 times slower than mach_absolute_time().
So we can't use it.
Depends on: 824742
(In reply to Brian Smith (:bsmith) from comment #46)
> 
> and see Kai's response in comment 43: "Well, I have a warning bot, that
> sends out warnings emails if aurora/beta contain beta versions. It would be
> nice to fix the version string on aurora. If you gave appropriate approvals
> to this bug, I could push that version number change into Aurora."

could we please get this done, in order to silence our daily panic warning email bot?
Attachment #699931 - Flags: approval-mozilla-beta?
Alex, the patch to fix the version string is very low risk.
Comment on attachment 699931 [details] [diff] [review]
fix nspr version number on beta branch

Sounds like we don't have any in-product dependencies on the version number itself (thus the low risk evaluation), so we can take this correctness fix.
Attachment #699931 - Flags: approval-mozilla-beta? → approval-mozilla-beta+
Thanks a lot for the approval.
https://hg.mozilla.org/releases/mozilla-beta/rev/c1e1a12e6e32
The checkin for this bug broke the NSPR build on some old Linux platforms that don't support CLOCK_MONOTONIC .

I think the fix is probably to test for this OS feature in autoconf and conditionally define HAVE_CLOCK_MONOTONIC in _linux.h .

I am adding Ashwani to the cc list who will take it from here.\
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Please open a new bug report for the build error. Thanks.
Status: REOPENED → RESOLVED
Closed: 12 years ago11 years ago
Resolution: --- → FIXED
Blocks: 1244320
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: