Closed Bug 1685938 Opened 4 years ago Closed 4 years ago

Reduce gap between CPU measurements and sample timestamp

Tracking

()

Status:

RESOLVED FIXED

Milestone:

86 Branch

Tracking Flags:

Tracking

Status

firefox86

---

fixed

People

(Reporter: mozbugz, Assigned: mozbugz)

References

(Blocks 1 open bug)

Details

Attachments

(4 files)

Bug 1685938 - Remove unused parameters in DoPeriodicSample - r?canaltinova 4 years ago Gerald Squelart (he/him) (not at Mozilla since 2022-09-15) (deleted), text/x-phabricator-request		Details
Bug 1685938 - Precompute sample timestamps as milliseconds in sampler loop - r?canaltinova 4 years ago Gerald Squelart (he/him) (not at Mozilla since 2022-09-15) (deleted), text/x-phabricator-request		Details
Bug 1685938 - Add RunningTimes::mPostMeasurementTimeStamp and use as sample timestamp - r?canaltinova 4 years ago Gerald Squelart (he/him) (not at Mozilla since 2022-09-15) (deleted), text/x-phabricator-request		Details
Bug 1685938 - Keep CPU measurements and their timestamp within 8*median duration - r?canaltinova 4 years ago Gerald Squelart (he/him) (not at Mozilla since 2022-09-15) (deleted), text/x-phabricator-request		Details

Gerald Squelart (he/him) (not at Mozilla since 2022-09-15)

Assignee

Description

•

4 years ago

A profile taken on MacOS after bug 1329600 seems to have 0% CPU usage everywhere.

Looking at the raw data (accessible through profile in the JS console), this is due to apparent spikes in CPU usage, which themselves are most probably due to gaps between the time CPU measurements are taken and the sample timestamp. These gaps can be caused by the OS scheduling another thread.

Consider the following sequence:

sampler loop 1: CPU, (long gap), timestamp, stack, no sleep (to "catch up")
sampler loop 2: CPU, (NO gap), timestamp, stack...

Because of the gap, we have CPU[2]-CPU[1] covering a much longer time than timestamp[2]-timestamp[1]. So when we compute the CPU usage over time, we get a spike. The above profile shows spike going for an apparent 15x the theoretical maximum!

One issue is the tiny sleep between sampler loops, which makes things worse. This will be dealt in separate bug 1685937.

The main issue we have is due to the gap between CPU measurements and timestamp.
On all supported systems, these are collected with different APIs, so we cannot guarantee that there couldn't be such a gap.
I doubt OSes provide ways to prevent scheduling interruptions. Maybe a higher thread priority could help though? In any case, I'll start with a generic solution, we can try to tweak it on different OSes later on.

The proposed solution here is to record timestamps immediately before and after the CPU measurements, and in cases where the time difference is "too long", we'd repeat the operation.
"Too long" can be determined with a quick test loop at the start of the profiling session.
With this, long gaps should be eliminated, so we can trust that (CPU[2]-CPU[1])/(timestamp[2]-timestamp[1]) is reliable.

The cost of repeating measurements should be negligible: The API calls themselves are short (in the order of a microsecond), so when we detect a "too long" gap (probably in tens of microseconds or more), it would indicate that the system was probably busy with something else (so there's little CPU waste), and the stack sample would have been delayed anyway, so it shouldn't hurt to delay it a little more, in order to get better CPU usage values.

Gerald Squelart (he/him) (not at Mozilla since 2022-09-15)

Assignee

Updated

•

4 years ago

Blocks: 1686172

Gerald Squelart (he/him) (not at Mozilla since 2022-09-15)

Assignee

Comment 1

•

4 years ago

Attached file Bug 1685938 - Remove unused parameters in DoPeriodicSample - r?canaltinova (deleted) — Details

Gerald Squelart (he/him) (not at Mozilla since 2022-09-15)

Assignee

Comment 2

•

4 years ago

Attached file Bug 1685938 - Precompute sample timestamps as milliseconds in sampler loop - r?canaltinova (deleted) — Details

This handles the conversion (from TimeStamp to number of milliseconds since process start) once and gives it to subroutines.
It will also help in a following patch where this value will be more closely tied with the CPU usage value.

Disambiguated nested delta variables in the Sampler:

sampleStartDeltaMs is at the start of each sampling loop,
threadSampleDeltaMs is associated to one thread being sampled during that loop.

Depends on D101543

Gerald Squelart (he/him) (not at Mozilla since 2022-09-15)

Assignee

Comment 3

•

4 years ago

Attached file Bug 1685938 - Add RunningTimes::mPostMeasurementTimeStamp and use as sample timestamp - r?canaltinova (deleted) — Details

mPostMeasurementTimeStamp records the time right after CPU measurements ended.
It is then used as the main sample timestamp, to both avoid another TimeStamp::Now() call, and to keep them as close together as possible.

Depends on D101544

Gerald Squelart (he/him) (not at Mozilla since 2022-09-15)

Assignee

Comment 4

•

4 years ago

Attached file Bug 1685938 - Keep CPU measurements and their timestamp within 8*median duration - r?canaltinova (deleted) — Details

Depends on D101545

Gerald Squelart (he/him) (not at Mozilla since 2022-09-15)

Assignee

Updated

•

4 years ago

Blocks: 1687402

Pulsebot

Comment 5

•

4 years ago

Pushed by gsquelart@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/cefa452f7131 Remove unused parameters in DoPeriodicSample - r=canaltinova https://hg.mozilla.org/integration/autoland/rev/a42b21d1612a Precompute sample timestamps as milliseconds in sampler loop - r=canaltinova https://hg.mozilla.org/integration/autoland/rev/5eb3bdbbdcd5 Add RunningTimes::mPostMeasurementTimeStamp and use as sample timestamp - r=canaltinova https://hg.mozilla.org/integration/autoland/rev/090fc4f02d83 Keep CPU measurements and their timestamp within 8*median duration - r=canaltinova

Sandor Molnar

Comment 6

•

4 years ago

bugherder

https://hg.mozilla.org/mozilla-central/rev/cefa452f7131
https://hg.mozilla.org/mozilla-central/rev/a42b21d1612a
https://hg.mozilla.org/mozilla-central/rev/5eb3bdbbdcd5
https://hg.mozilla.org/mozilla-central/rev/090fc4f02d83

Status: NEW → RESOLVED

Closed: 4 years ago

status-firefox86: --- → fixed

Resolution: --- → FIXED

Target Milestone: --- → 86 Branch

You need to log in before you can comment on or make changes to this bug.

Bugzilla

Reduce gap between CPU measurements and sample timestamp

Categories

(Core :: Gecko Profiler, enhancement, P2)

Tracking

()

People

(Reporter: mozbugz, Assigned: mozbugz)

References

(Blocks 1 open bug)

Details

Crash Data

Security

(public)

User Story

Attachments

(4 files)

Description

Updated

Comment 1

Comment 2

Comment 3

Comment 4

Updated

Comment 5

Comment 6

Attachment

General

Description

File Name

Content Type