Closed Bug 1690669 Opened 4 years ago Closed 2 years ago

Too much CPU usage spent in the native event loop when it is woken up by another thread

Categories

(Core :: Widget: Cocoa, defect, P3)

All
macOS
defect

Tracking

()

RESOLVED WORKSFORME

People

(Reporter: mstange, Unassigned)

References

Details

(Keywords: power)

Profile: https://share.firefox.dev/2YFEghw

Our native event loop on macOS seems to have a lot of overhead. This causes unnecessary CPU usage if other threads post events to the main thread of the parent process (which is something that happens all the time).

Steps to reproduce:

  1. Open the browser console. (Not the web console! We want to execute JS in the parent process. Set devtools.chrome.enabled to true if you need to.)
  2. Paste the following into the console's text field and press enter:
let eventChurner = (function () {
  let messageName = 'ping-pong';
  let workerJs = `
    onmessage = e => {
      postMessage(e.data);
    };
  `;
  let blobUrl = URL.createObjectURL(new Blob([workerJs]));
  let worker = new Worker(blobUrl);
  function handleMessage(event) {
    worker.postMessage(event.data);
  }

  return {
    start: () => {
      worker.addEventListener('message', handleMessage);
      worker.postMessage('ping-pong');
    },
    stop: () => {
      worker.removeEventListener('message', handleMessage);
      worker.terminate();
    },
  };
})();
  1. Execute eventChurner.start().
  2. Start your profiler, wait for a bit, and capture the profile.
  3. Important: execute eventChurner.stop(). (otherwise you'll keep burning CPU)

Expected results:
Most of the time should be spent either idle or running the Gecko Worker runnable.

Actual results:
https://share.firefox.dev/2YFEghw
0% idle, ~7% Worker runnable, 93% event loop overhead.

I wouldn't be very surprised if we have similar issues in parent process also in other platforms.
AppShell code hasn't been optimized.... ever?

The macOS AppShell code was last optimized 7 years ago, in bug 996848. (Filing this bug gave me a déjà vu of filing bug 606937...)

Depends on: 1690687
Severity: -- → S2
Priority: -- → P2

So it looks like just removing this call to postEvent might be sufficient to mostly fix the problem. I believe, the other call to postEvent is only used when we're nesting.

I've pushed to try to see how things look:
https://treeherder.mozilla.org/jobs?repo=try&revision=1162208aa7cdab03c0e29f843292fdf7addfbdaf

I reprofiled with bug 1690687 landed.
https://share.firefox.dev/2WWTj9D

It looks a lot better.

Much better!

My profile from comment 0 has ~4800 message events per second. Your profile from comment 4 has ~14000 message events per second. So about a 3x throughput increase.

(I executed the following on the web console:)

function avgRate(markers) {
  const firstStart = markers[0].start;
  const lastStart = markers[markers.length - 1].start;
  const duration = lastStart - firstStart;
  const markersPerMs = (markers.length - 1) / duration;
  return markersPerMs * 1000;
}
avgRate(filteredMarkers.filter(m => m.name === "DOMEvent"))

Markus, given your comment I'm lowering the severity, but perhaps this bug can be outright closed?

Severity: S2 → S3
Flags: needinfo?(mstange.moz)
Priority: P2 → P3

Sounds good.

Status: NEW → RESOLVED
Closed: 2 years ago
Flags: needinfo?(mstange.moz)
Resolution: --- → WORKSFORME
You need to log in before you can comment on or make changes to this bug.