Closed
Bug 822490
Opened 12 years ago
Closed 12 years ago
Radically simplify windows TimeStamp implementation
Categories
(Core :: XPCOM, defect)
Tracking
()
RESOLVED
FIXED
mozilla21
People
(Reporter: mayhemer, Assigned: mayhemer)
References
(Depends on 1 open bug)
Details
Attachments
(1 file, 3 obsolete files)
(deleted),
patch
|
robarnold
:
review+
mayhemer
:
checkin+
|
Details | Diff | Splinter Review |
After analyzing bug 765215 I found out that the calibration code is too inaccurate, fragile and needs a lot of synchronization, flags and is very non-deterministic.
I want to use the fact we never directly use TimeStamp::Now() value but we always differ two timestamps (epoch and now). Based on that I want TimeStamp::Now() remember both QPC and GTC values. When we count the difference (override for the - operator) I want to do the following:
- compare skew (=QPC-GTC) of both timestamps
- if the difference of the two skews falls under some threshold, use QPCa - QPCb as a result for TimeDuration
- if not, use GTCa - GTCb as a result
- also, if the skew differs too much but we've recorded both stamps in some short time each after other (say 4 seconds) then switch a flag to stop using QPC at all since it is not reliable
Simple and perfectly working w/o any locks at all, handling also the wake up issue.
I will leave TSC and WIN64 api for GetTickCount as an optimization.
-----------------------------------------------
The main reason to transit this way is a big issue after waking up when GTC and QPC skew may radically change and we may jump in the TimeStamp::Now() result back. To explain more in detail:
- we record a skew, QPC and GTC are stable
- we sleep the machine
- we wake up the machine after say 2 minutes
- we access TimeStamp::Now() which detects that QPC and GTC are too far each from other -> falls back to use GTC + skew ; the skew determined previously
- we detect wake up and force recalibration
- we change the skew to a radically different number and start using QPC again for the result
=> but the different between GTC + previous skew and QPC is going back too much, so we for a long time (may be hours!) return the same value from TimeStamp::Now() that is doing simply |return MAX(now, lastResult);|. lastResult is the last GTC + skew while QPC is much smaller number then |GTC + skew|, since QPC has been reset or simply jumped far from its expected monotonicity
Comment 1•12 years ago
|
||
I think this is a good idea.
Assignee | ||
Comment 2•12 years ago
|
||
- implemented according the description
- when TSC is unstable, we use QPC values when two stamps are up to 2 seconds each from other (according GTC) ; for longer times hi precision is not needed and GTC is used
- when two time stamps have been taken within a two seconds interval, we do skew difference check as in the current implementation ; if out of bounds, QPC is disabled just by flipping a flag
- this approach doesn't need any special handling of suspend/wakeup (if shorter then 2 seconds, which in 99.99% cases is) since potentially unstable QPC is used and checked only for shorter intervals
Attachment #706167 -
Flags: review?(tellrob)
Assignee | ||
Comment 3•12 years ago
|
||
This one is better (fixed hard failure interval calculation, between else).
Try: https://tbpl.mozilla.org/?tree=Try&rev=c4626c0482db
- implemented according the description
- we do QPC/GTC skew difference check when subtracting two time stamps
- when out of bound, GTC is used
- when out of bound AND the two time stamps have been taken within a two seconds interval, QPC is completely disabled (just by flipping a flag)
- this approach doesn't need any special handling of suspend/wakeup since GTC will be preferred over potentially unstable QPC automatically
Attachment #706167 -
Attachment is obsolete: true
Attachment #706167 -
Flags: review?(tellrob)
Attachment #706202 -
Flags: review?(tellrob)
Comment 4•12 years ago
|
||
Comment on attachment 706202 [details] [diff] [review]
v1.1
Review of attachment 706202 [details] [diff] [review]:
-----------------------------------------------------------------
I have some questions about this but I like the idea (I think this is what Microsoft wanted us to do originally when we started using QPC).
"I love this plan! I'm excited to be a part of it! LET'S DO IT!"
::: xpcom/ds/TimeStamp.h
@@ +159,5 @@
>
> return TimeDuration::FromTicks(int64_t(aTicks));
> }
>
> + // Duration, result is implementation-specific difference of two TimeStamps
Hmm, someone else should probably sign off on this change (sr time?)
::: xpcom/ds/TimeStamp_windows.cpp
@@ +61,5 @@
> +static const uint32_t kQPCHardFailureDetectionInterval = 2000;
> +
> +// On every use of QPC values we check the overflow of skew difference of the
> +// two stamps doesn't go over this number of milliseconds. Both timer
> +// functions jitter so we have to have some limit. The value is based on tests.
Which tests? Can you attach them to the bug?
@@ +133,5 @@
> +// ----------------------------------------------------------------------------
> +
> +// Initially true, set to false when QPC is found unstable and never
> +// returns back to true since that time.
> +bool volatile sUseQPC = true;
These two should be static, right?
@@ +248,5 @@
> // jump by 32 ms in two steps, therefor use the ceiling value.
> sUnderrunThreshold =
> LONGLONG((-2) * ticksPerGetTickCountResolutionCeiling);
>
> + // QPC should go no further then 2 * GTC resolution.
nit (while you're here...): s/then/than
@@ +410,5 @@
> + if (aDuration < sQPCHardFailureDetectionInterval) {
> + // Interval between the two time stamps is very short, consider
> + // QPC as unstable and disable it completely.
> + sUseQPC = false;
> + LOG(("TimeStamp: QPC disabled"));
So if we ever compare two timestamps that are too close together, QPC gets turned off for everyone?
Assignee | ||
Comment 5•12 years ago
|
||
(In reply to Rob Arnold [:robarnold] from comment #4)
> Comment on attachment 706202 [details] [diff] [review]
> v1.1
>
> Review of attachment 706202 [details] [diff] [review]:
> -----------------------------------------------------------------
>
> I have some questions about this but I like the idea (I think this is what
> Microsoft wanted us to do originally when we started using QPC).
>
> "I love this plan! I'm excited to be a part of it! LET'S DO IT!"
Wish I remember where the idea came to me :)
>
> ::: xpcom/ds/TimeStamp.h
> @@ +159,5 @@
> >
> > return TimeDuration::FromTicks(int64_t(aTicks));
> > }
> >
> > + // Duration, result is implementation-specific difference of two TimeStamps
>
> Hmm, someone else should probably sign off on this change (sr time?)
I'm just changing the comment, but the value has already been like this.
>
> ::: xpcom/ds/TimeStamp_windows.cpp
> @@ +61,5 @@
> > +static const uint32_t kQPCHardFailureDetectionInterval = 2000;
> > +
> > +// On every use of QPC values we check the overflow of skew difference of the
> > +// two stamps doesn't go over this number of milliseconds. Both timer
> > +// functions jitter so we have to have some limit. The value is based on tests.
>
> Which tests? Can you attach them to the bug?
Unfortunatelly not. Those were just little console programs exercising the GTC and QPC APIs on my old intel based winxp machine. It was very empiric to watch what happens, and the numbers are still more guesses then exact numbers. However, in real usage it showed up to be more or less tuned. Also, I no longer posses that xp machine to test with unfortunately.
> @@ +410,5 @@
> > + if (aDuration < sQPCHardFailureDetectionInterval) {
> > + // Interval between the two time stamps is very short, consider
> > + // QPC as unstable and disable it completely.
> > + sUseQPC = false;
> > + LOG(("TimeStamp: QPC disabled"));
>
> So if we ever compare two timestamps that are too close together, QPC gets
> turned off for everyone?
No. If the difference goes too far over the basic limit *and* we are under 2 seconds duration of the two comparing time stamps, then we disable QPC completely.
The calculation is following:
SKEW1 = QPC1 - GTC1
SKEW2 = QPC2 - GTC2
DIFF = SKEW1 - SKEW2 // can jitter since GTC increases in 15.6 ms intervals (usually)
IS_POTENTIALLY_OVERFLOWING = DIFF not in [-32, 32] // 32 is simplified value of default GTC time increment * 2
OVERFLOW = ABS(DIFF) - 32 **
IS_OVERFLOWING = OVERFLOW > 100ms
if (IS_OVERFLOWING && ABS(GTC1 - GTC2) < 2s) -> bye bye QPC
** Before this line DIFF is linearly adjusted as the two time stamps would be taken with 2 seconds delay each after other. Thus, when you compare two stamps one taken at 10000ms and other at 30000ms, and DIFF is, say, 250ms, we do: DIFF = (2000 / (30000 - 10000)) * DIFF , giving DIFF = 25ms. The 100ms overflow per 2000ms is something like giving 5% tolerance to QPC jitter.
Except some issues with overflow calculation (that can for large numbers internally overflow and potentially give wrong numbers, according my logging - should be fixed even in the current version) this works well.
However, I'm thinking of some penalty system here while also tightening the limit to some 2% only. Say 3 problems in 10 seconds mean to stop using QPC, but here I would guess since I don't have a machine to test with.
Also, usually problems with QPC are on exotic systems like 64bit winxp on AMD processors. On my win7/full intel machine QPC works well. On my old intel core laptop with win8 it has usual wakeup issues, that this code perfectly detects.
Assignee | ||
Comment 6•12 years ago
|
||
(In reply to Rob Arnold [:robarnold] from comment #4)
> Comment on attachment 706202 [details] [diff] [review]
> v1.1
> > + // Duration, result is implementation-specific difference of two TimeStamps
>
> Hmm, someone else should probably sign off on this change (sr time?)
As I'm thinking of it...
1) the comment was misplaced all the time
2) this is just an internal variable that hold the time in "some" form/units
The important thing is that ToSeconds() will convert this internal value to represent actual seconds. What mValue holds doesn't at all matter.
Also, I'm not sure I really need to define my own copy constructor and the assignment operator. What do you think? Implicit will do the same I believe.
Comment 7•12 years ago
|
||
Comment on attachment 706202 [details] [diff] [review]
v1.1
Review of attachment 706202 [details] [diff] [review]:
-----------------------------------------------------------------
I'd prefer to see a more lenient way of disabling QPC but that could be in a separate patch. I worry that since arithmetic is allowed on timestamps, it is possible to erroneously trigger the heuristic.
::: xpcom/ds/TimeStamp_windows.h
@@ +27,5 @@
> + bool CheckQPC(int64_t aDuration, const TimeStampValue &aOther) const;
> +
> +public:
> + struct _SomethingVeryRandomHere;
> + TimeStampValue(_SomethingVeryRandomHere* nullValue);
I should have asked about this...some new Mozilla convention I'm unaware of?
@@ +28,5 @@
> +
> +public:
> + struct _SomethingVeryRandomHere;
> + TimeStampValue(_SomethingVeryRandomHere* nullValue);
> + TimeStampValue(const TimeStampValue &aOther);
Yeah, your implementation here is the same as the compiler's.
Attachment #706202 -
Flags: review?(tellrob) → review+
Assignee | ||
Comment 8•12 years ago
|
||
(In reply to Rob Arnold [:robarnold] from comment #7)
> Comment on attachment 706202 [details] [diff] [review]
> v1.1
>
> Review of attachment 706202 [details] [diff] [review]:
> -----------------------------------------------------------------
>
> I'd prefer to see a more lenient way of disabling QPC but that could be in a
> separate patch. I worry that since arithmetic is allowed on timestamps, it
> is possible to erroneously trigger the heuristic.
I'll file a bug. Landing this is similar to what we have now.
>
> ::: xpcom/ds/TimeStamp_windows.h
> @@ +27,5 @@
> > + bool CheckQPC(int64_t aDuration, const TimeStampValue &aOther) const;
> > +
> > +public:
> > + struct _SomethingVeryRandomHere;
> > + TimeStampValue(_SomethingVeryRandomHere* nullValue);
>
> I should have asked about this...some new Mozilla convention I'm unaware of?
This is copied from TimeDuration constructor.
>
> @@ +28,5 @@
> > +
> > +public:
> > + struct _SomethingVeryRandomHere;
> > + TimeStampValue(_SomethingVeryRandomHere* nullValue);
> > + TimeStampValue(const TimeStampValue &aOther);
>
> Yeah, your implementation here is the same as the compiler's.
Removed in the next patch version.
Thanks.
Assignee | ||
Comment 9•12 years ago
|
||
- appropriate globals turned to static
- using implicit copy ctor and = op for TimeStampValue
- using std::abs where appropriate
- fixed log (s/%d/%llu/) that falsely led me to think of an arithmetic bug
All over the review comments are nits, so I just carrying forward r+.
I'll be testing this a short time on my 2 core old machine to check logs and other stuff works.
Attachment #706202 -
Attachment is obsolete: true
Attachment #708714 -
Flags: review+
Assignee | ||
Comment 10•12 years ago
|
||
(In reply to Honza Bambas (:mayhemer) from comment #9)
> I'll be testing this a short time on my 2 core old machine to check logs and
> other stuff works.
The algorithm as is now may be too sensitive. I got some jitter that went 10ms over the threshold for two stamps just 14ms each from other. This cannot happen in the current code, so we would regress. I will have to fix bug 836869 inside this one, or just make thresholds more benevolent before landing.
Assignee | ||
Comment 11•12 years ago
|
||
- raise of the tolerance threshold from 32ms to 64ms +- ; actually, the timeIncrement * 4
- added few more logs for TSC, timeIncrement and QPC frequecy
I'm not sure whether this even needs a rereview...
Attachment #708714 -
Attachment is obsolete: true
Attachment #711958 -
Flags: review?(tellrob)
Comment 12•12 years ago
|
||
Comment on attachment 711958 [details] [diff] [review]
1.3
Review of attachment 711958 [details] [diff] [review]:
-----------------------------------------------------------------
I didn't look at the interdiff but it seems fine based on what you've said.
Attachment #711958 -
Flags: review?(tellrob) → review+
Assignee | ||
Comment 13•12 years ago
|
||
Comment on attachment 711958 [details] [diff] [review]
1.3
https://hg.mozilla.org/integration/mozilla-inbound/rev/604e819f63d7
Attachment #711958 -
Flags: checkin+
Comment 14•12 years ago
|
||
Status: ASSIGNED → RESOLVED
Closed: 12 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla21
Comment 15•12 years ago
|
||
I landed a fixup for GCC that requires cstdlib included for std::abs:
https://hg.mozilla.org/integration/mozilla-inbound/rev/d3e16515f630
Comment 16•12 years ago
|
||
> https://hg.mozilla.org/integration/mozilla-inbound/rev/d3e16515f630
I already have a reviewed patch in Bug 840911 :P
Comment 17•12 years ago
|
||
Comment 18•12 years ago
|
||
(In reply to Philip Chee from comment #16)
> > https://hg.mozilla.org/integration/mozilla-inbound/rev/d3e16515f630
> I already have a reviewed patch in Bug 840911 :P
I didn't notice it, sorry about that.
Comment 19•12 years ago
|
||
Mochitest 3 test "performance_now" seems to have increased frequency of intermittent failures, possibly as a result of this bug. Some analysis on bug 749894 suggests that on the try servers, we're getting "QPC Disabled" at about 25% of the tests (which can sometimes fail the test, but not always).
Need to decide if the "QPC disabled" case should pass of fail that test, and adjust the test accordingly.
Depends on: 749894
Comment 20•12 years ago
|
||
(In reply to Avi Halachmi (:avih) from comment #19)
> ... Some analysis on
> bug 749894 suggests that on the try servers, we're getting "QPC Disabled" at
> about 25% of the tests ...
On windows XP, with higher percentage at XP-debug builds (than XP-opt).
Comment 21•12 years ago
|
||
Although I think this patch looks fine, I was surprised to find it when reading TimeStamp hg logs. In the future, please ask an XPCOM peer for review or at least ask us to delegate the review.
Assignee | ||
Comment 22•12 years ago
|
||
(In reply to Benjamin Smedberg [:bsmedberg] from comment #21)
> Although I think this patch looks fine, I was surprised to find it when
> reading TimeStamp hg logs. In the future, please ask an XPCOM peer for
> review or at least ask us to delegate the review.
Ups, didn't cross my mind. Do you want to check on the patch now and make me some changes?
Comment 23•12 years ago
|
||
Nono, like I said, the patch looks fine. I just wish I had been aware of it ;-)
You need to log in
before you can comment on or make changes to this bug.
Description
•