Closed Bug 1715922 Opened 3 years ago Closed 3 years ago

Busy processes don't get sampled at the expected rate

Categories

(Core :: Gecko Profiler, defect, P2)

Unspecified
Windows
defect

Tracking

()

RESOLVED FIXED
91 Branch
Tracking Status
firefox-esr78 --- unaffected
firefox89 --- unaffected
firefox90 --- wontfix
firefox91 --- fixed

People

(Reporter: mozbugz, Assigned: mozbugz)

References

(Regression)

Details

(Keywords: regression)

Attachments

(1 file)

Found by :jesup :

The profile of CNN I have proves that it samples at 16ms when we're running 100% of the time for seconds
https://share.firefox.dev/3gor8Gj

"Isolated Web Content (3/3)" pid 31472 looks fairly busy starting around 4.2s, but the sampling intervals stay at 15-16ms for that process.

As per bug 1703410 comment 5, I expected busy threads to make other timers in the same process (including the profiler's) be serviced quicker than the default 64Hz.

I still think bug 1703410 did a useful thing, by avoiding to mess with the default timer resolution, as this was causing real issues to become hidden when starting the profiler. But in some cases like this one, it can prevent useful profiling of some busy processes.

The easiest solution would be to add an option to change the timer resolution.

More intelligently&complex, maybe the profiler could only adjust the timer resolution when it detects high CPU levels after a few long intervals? But I'm scared that this could trigger more unwanted timer-firings in all processes (than would happen without the timer resolution change), which would in turn increase the CPU load, which would prevent returning the timer resolution to its default 64Hz. πŸ€”

:jesup's opinion:

I may not know all the downsides of bumping the timer rate on windows, but IMO we should default to bumping it, and have a toggle to drop to 16ms (or rather whatever windows will default to, depending on what else is running). Windows will bump your timer rate depending on other apps, so we need to handle that smoothly anyways. To be able to get profiles in those rare? cases where the timer rate matters, a toggle will provide the needed ability.

Thank you for your input.

Since we (most probably) cannot please everybody, we'll need to decide which default to take.

OS: Unspecified → Windows

Note: Setting the environment variable PROFILER_ADJUST_TIMER_RESOLUTION to anything restores the timer resolution changes. Maybe it should be flipped?

Set release status flags based on info from the regressing bug 1703410

This reverses bug 1703410: By default the profiler changes the timer resolution (normally 64Hz) when the requested sampling interval is lower than 10ms, to allow fast-enough sampling for most uses.

But since this can influence other timers in Firefox, which makes debugging some issues more difficult. To help with this, there is now a "Keep Timer Resolution" on Windows, which prevents the profiler from changing the timer resolution, at a risk of slowing down sampling in some processes.

Assignee: nobody → gsquelart
Status: NEW → ASSIGNED

:jesup, I haven't been able to reproduce the issue you experienced, so I'm flying blind with this patch! Would you be able to try it please, and check that the "Keep Timer Resolution" in about:profiling is making the issue visible again (proving that timers are not tweaked in that case)? TIA.

If you don't have time to use the patch from the review, you can find pre-built binaries in this try: https://treeherder.mozilla.org/jobs?repo=try&group_state=expanded&revision=74ea848a8d2498116fe69d3f8fa45eef535b5c80

Flags: needinfo?(rjesup)

This definitely fixes the 16ms issue; and checking the box drops back to 16ms.

I do note that without the box checked, I'm getting 2ms sampling on all processes sampling cnn.com (or just example.com). Changing to 2ms requested keeps it at 2ms, so this implies it won't go faster than 2ms. This is on an unloaded Ryzen 3600 (i.e. fast-ish) desktop. I don't know what the sampling rate was before.

Flags: needinfo?(rjesup)

Thank you :jesup, we'll push ahead with the patch.

About the 2ms minimum, we already have bug 1574815 to see if it can be fixed (doubtful) and bug 1574816 to change the default to 2ms so it's a less visible issue.
Note that you can set the sampling interval to a value lower than 1ms, in which case we switch to a yield-until loop, which can wait smaller intervals but at a higher overhead cost.

Attachment #9226758 - Attachment description: Bug 1715922 - Windows timer resolution is changed by default, except if keeptimerresolution feature is set - r?canaltinova → Bug 1715922 - Windows timer resolution is enhanced by default, except if notimerresolutionchange feature is set - r?canaltinova
Pushed by gsquelart@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/f3199f26ef6e
Windows timer resolution is enhanced by default, except if notimerresolutionchange feature is set - r=canaltinova
Status: ASSIGNED → RESOLVED
Closed: 3 years ago
Resolution: --- → FIXED
Target Milestone: --- → 91 Branch
Has Regression Range: --- → yes
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: