Open Bug 1369494 Opened 7 years ago Updated 1 year ago

Consider optimizing setTimeout some

Categories

(Core :: DOM: Core & HTML, enhancement, P3)

enhancement

Tracking

()

People

(Reporter: ehsan.akhgari, Unassigned)

References

Details

Attachments

(1 file)

From bug 1363829 comment 173:
> For future reference, here is a profile of:
> 
> https://mozdevs.github.io/servo-experiments/experiments/tiles/
> 
> On my mac book pro using 55.0a1 (2017-05-28) (64-bit):
> 
> https://perfht.ml/2soSOQz

If you look at the profile (https://perfht.ml/2rZ731O), around 90% of the time under TimeoutManager::SetTimeout() is spent under mozilla::dom::TimeoutManager::Timeouts::Insert().  The only possible explanation is that the iteration over the timeouts linked list is extremely expensive: <https://searchfox.org/mozilla-central/rev/1a0d9545b9805f50a70de703a3c04fc0d22e3839/dom/base/TimeoutManager.cpp#1290>.

We should try to see if we can optimize this a bit.

Note that this isn't as simple as "avoid using a linked list" because the iteration here is backwards, which would be pretty cache hostile even with a normal array storage...  I haven't yet thought about any solutions here.
I actually am already trying a patch locally for this.

The bigger reason we cannot get rid of linked list is that setTimeout() has some weird ordering constraints.  Consider that we must obey the insertion point semantics in RunTimeout(), etc.  A sorting algorithm based on time would not respect this.
I'm trying a heuristic where we start our insertion search at the last point we inserted a timeout instead of always starting at the end.  This will be no worse than what we do today, but could help with sites that schedule a lot of timers.
Attached patch wip (deleted) — Splinter Review
This didn't work very well.  It doesn't paint the tiles page correctly and it doesn't seem to improve searching at all.  I'm posting here, but not going to look at this approach any further.

Maybe we could try some kind of skip list:

* Make the timer list a linked list of N ms "buckets" based on aligned epoch time boundaries
* Each bucket contains a linked list of the timeouts falling within that bucket time range
* To find an insertion point we would first find the bucket range and then the position within the bucket.

So we would in effect be skipping potentially large chains of timeouts in each bucket as we searched.

Its unclear if the current RunTimeout() insertion point semantics could be implemented with this approach, though.
Assignee: nobody → bkelly
Uh, did not mean to assign myself here.  Not actively working at the moment.
Assignee: bkelly → nobody
Priority: -- → P2
FWIW I found some disheartening numbers looking at Ben's profiles from <https://mozdevs.github.io/servo-experiments/experiments/tiles/> which I posted in bug 1363829 comment 176, reposting here:

I broke in the JS debugger in the servo demo to see how many tiles I have.  I had a 91*50 grid.  That means we'll iterate over 4549 timeouts *in the last iteration of the loop*, and I bet the real number is much lower than that.  And this loop only runs one time in the beginning of each round of a new effect rolling over a new picture (so not frequently at all).  This really shouldn't be that slow, and we should barely see it in the profile...

Someone should check how many timeouts we are actually iterating over in this test case so that we can get a real sense of how much slowness we're talking about, but if my numbers above are correct this is pretty scary.  It may be a good idea to look at other iterations we do over the list of timeouts as well...
(In reply to Ben Kelly [reviewing, but slowly][:bkelly] from comment #1)
> I actually am already trying a patch locally for this.
> 
> The bigger reason we cannot get rid of linked list is that setTimeout() has
> some weird ordering constraints.  Consider that we must obey the insertion
> point semantics in RunTimeout(), etc.  A sorting algorithm based on time
> would not respect this.

I'm not sure I'm following, do you mind elaborating please?

For example, we can try out a level-ordered sorted binary search tree which would be efficient to search (but perhaps bad for reverse traversal of course).  Then we can implement the insertion point by storing the time of the timeout at the insertion point which will allow us to search for and insert a timeout at the specific spot.  I don't see what's inherent in the semantics here that makes linked lists unavoidable...
(In reply to :Ehsan Akhgari (needinfo please, extremely long backlog) from comment #6)
> (In reply to Ben Kelly [reviewing, but slowly][:bkelly] from comment #1)
> > I actually am already trying a patch locally for this.
> > 
> > The bigger reason we cannot get rid of linked list is that setTimeout() has
> > some weird ordering constraints.  Consider that we must obey the insertion
> > point semantics in RunTimeout(), etc.  A sorting algorithm based on time
> > would not respect this.
> 
> I'm not sure I'm following, do you mind elaborating please?

The problem is the insertion point can split timeout entries at the same TimeStamp resolution.  If the sort doesn't maintain exact relations between same-value entries then we can't use it.

I guess we could have some one-up counter that we increase whenever we would have added a dummy insertion point marker.  New timers get the new ID and it will auto-sort later than previous ones.  We would have to be careful about wrapping, though.

Maybe our "FiringId" is already this dummy counter.
If we can remove the dummy insertion point logic I think this problem becomes easier.  Maybe we should do that first.
(In reply to :Ehsan Akhgari (needinfo please, extremely long backlog) from comment #5)
> Someone should check how many timeouts we are actually iterating over in
> this test case so that we can get a real sense of how much slowness we're
> talking about, but if my numbers above are correct this is pretty scary.  It
> may be a good idea to look at other iterations we do over the list of
> timeouts as well...

I did this yesterday.  Many setTimeouts iterate many timeouts per call.  It increases up to 4000, 5000, etc timeouts.  Its not just the last loop iteration.
I'm trying to remove the insertion point stuff over in bug 1370025.

Even if thats successful, though, if we want to switch to something like a normal sorting algorithm we will have to pick one that does a stable sort.  We can't let the sorting algorithm reorder this:

  setTimeout(f, 0);
  setTimeout(g, 0);
  setTimeout(h, 0);

So I don't think binary sort is an option.
Depends on: 1370025
Well, I guess we could some kind of sequence ID that is used to order if TimetStamp is equal.
(In reply to Ben Kelly [reviewing, but slowly][:bkelly] from comment #11)
> Well, I guess we could some kind of sequence ID that is used to order if
> TimetStamp is equal.

Yeah maintaining sort stability usually can be hacked around in the sort comparison function by using a sequence ID like you suggest if we find two otherwise equal entries as you suggested!
So how does the landscape look now with bug 1370025?  Have you had time to think about this more?
I've continued to think about it.  Its easier now, but its still a bit tricky because of ResetTimersForThrottleReduction().  I think we should try to get rid of that next.
Bug 1371787 removes another place where we change Timeout::When() values.  Again, this makes it easier for us to change the list data structure because there are less places we might have to resort.  (Of course the code being removed in bug 1371787 is kind of buggy in that it *ignores* the possibility of resorting.)
Depends on: 1371787
I'm not sure I will have time to work on this for a while.  Let me write down my current thoughts.

I think probably the best way forward is to move towards a binary heap sorted timeouts list.  This is the same thing we are using in TimerThread at the moment.  The main trickiness involved is that you can't iterate the list in order without modifying it.  So this will require some more refactoring.  I think we can achieve it, though, by doing:

1. Create a natural sort order by using a sequence ID on Timeout objects.
2. Make the top loop of RunTimeout() actually remove the Timeout objects to be executed and place them in a new "currently executing list".
  a. This "currently executing list" would need to be part of the mFiringIdStack since we can have nested RunTimeout() calls.
  b. The second loop in RunTimeout() would only iterate over its own "currently executing list".
  c. Any items left on the "currently executing list" at the end of RunTimeout() would need to be put back into the main timeout list.
  d. ClearTimeout() and ClearAllTimeouts() would need to take into account the "currently executing lists" on the mFiringIdStack
3. Remove the concept of FiringId.  With the natural sort order and the separate "currently executing lists" we will no longer need FiringId.
4. Remove all calls to Timeout::getNext(), getPrevious(), remove(), etc outside of the Timeouts list code.
5. Change the Timeouts list code to only expose FirstTimeout(), Push(), Pop(), and unordered iteration.
6. Make Timeouts list code use nsTArray with std::push_heap/pop_heap internally.
Oh, we would also need to make ClearTimeout() mark a Timeout as cancelled without actually removing it from the list.  We would then just drain any cancelled timeouts from the head of the list whenever we look at it.
Moving to p3 because no activity for at least 1 year(s).
See https://github.com/mozilla/bug-handling/blob/master/policy/triage-bugzilla.md#how-do-you-triage for more information
Priority: P2 → P3
Component: DOM → DOM: Core & HTML
Severity: normal → S3

Just FYI, the performance of TimeoutManager::Timeouts::Insert() should be improved (on Windows only) by the TimeStamp changes from https://bugzilla.mozilla.org/show_bug.cgi?id=1841352. In my local stress test of that function I saw a reduction of 12-13% of the cost. (TimeStamp comparisons were previously very expensive on Windows.)

You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: