Closed Bug 746869 Opened 12 years ago Closed 12 years ago

Console is abysmally slow

Categories

(DevTools :: Console, defect)

x86
macOS
defect
Not set
major

Tracking

(Not tracked)

RESOLVED WORKSFORME

People

(Reporter: bzbarsky, Unassigned)

References

(Depends on 1 open bug)

Details

From a mailing list post I read earlier today:

   Firefox's console is catastrophically broken; on my system, it takes about
   4ms *per log*.  

I had a hard time believing this, so I tried a testcase:

  var start = new Date();
  for (var i = 0; i < 1000; ++i)
    console.log(i);
  var end = new Date();
  alert(end - start);

The alert indeed showed 4000ms or so.

So I did a profile.  I'll follow bugs blocking this one on particular issues from it, but a summary.

Some time is spent in general console.log security-check overhead, XPConnect, etc.  But 91% of the total time is spent notifying console service observers, and in particular the HUDService.  This breaks down as follows (all percentages of total runtime):

  40% of the time is spent flushing layout from scrollHeight getters.
  11% is spent calling querySelectorAll.
  11% is spent appending nodes (eager frame construction and whatnot, presumably
      due to XBL bindings).
   6% removing nodes from the document (a good chunk of this is XBL gunk).
   5% calling XPCOM goop (nsScrollBoxObject::EnsureElementIsVisible,
      nsJCID::CreateInstance, nsStringBundle::FormatStringFromName,
      nsWindowMediator::GetMostRecentWindow, nsObserverService::NotifyObservers,
      nsGlobalWindow::GetInterface, etc, etc).
   4% setting scrollTop.
   4% getting a non-numeric property from nodelists.  Maybe .length?  Hard to
      tell.
   4% calling the classList getter on elements.
   3% getting numeric properties from nodelists.
   3% adding/removing classes via classList.
   
And a long tail of creating elements, creating text nodes, adding event listeners, .childNodes, .style gets, .tagName gets, .ownerDocument gets, etc, etc, etc.
Depends on: 746870
Depends on: 746871
Depends on: 746872
I filed some bugs blocking this one on specifics, but there are two general algorithmic problems that need to be solved no matter what micro-optimization is done on the guts:

1)  The console does a _lot_ of work for every message.
2)  The console does all this work (which involves going through all messages!) eagerly on
    every message instead of coalescing it across messages.
Thank you for the these data. This is going to help lot. (And we are aware of this performance problem: bug 722685).

Mihai, does your latest work on the async-webconsole will improve the overall performance?
Ah, indeed.  I should have searched for dups!

Adding a dep, but feel free to dup as desired, of course.

I'm also happy to reprofile with any proposed patches.
Blocks: 722685
Awesome bug reports Boris! Thank you very much!

Indeed, in bug 673148 we are working on async web console which decouples all the network, console API and script error listeners from the UI code. That code started out with the e10s API, but we will switch to the remote debugging protocol once the main patches land. This will allow us to use the web console with remote instances of Firefox.

However, bug 673148 does not affect the UI, so it's still slow.

We plan to move the Web Console UI into its own iframe, but applying your suggestions for performance improvements are, I believe, a must. We also want to rewrite the way the messages are outputted in the Web Console.

I hope we can do these perf improvements ASAP.
I think we could do some rework of the actual log output mechanism when moving to the iframe as mihai suggests.

I'd rather not deflect mihai's efforts on the async back end for now as it's a fairly heavy set of patches.

But this is great information. Thanks for doing it, Boris. We may indeed want to reprofile with some test patches.
that said, I do think there might be a couple of easy wins we can do around logging output that shouldn't interfere too much with what mihai's doing.

any victi^H^H^H^Holunteers?
Boris: please let us know if we can close this bug and the dependencies, now that bug 722685 is fixed. Any further suggestions on how we can improve performance are very much welcome. Thank you!
So the behavior I see in this morning's nightly using the testcase from comment 0 and loading it with the console _open_ is a follows:

1)  Alert comes up saying "52" (so took about 52us per console.log call).
2)  The actual logged things appear over the next 2-3 seconds in the web console.
3)  The browser appears to be reasonably responsive during those 2-3 seconds.

So the behavior seems to be a lot better, though the "things slowly scrolling in" bit is a bit odd.

If I do those steps with the console _closed_, then when I later actually open the console I see the 1000 log entries appear one after the other (as in, when I open it, it's only showing the first 200-300, and then the rest seem to get added a few at a time or something.

So we can probably close this bug as filed, but there is still a problem of some sort: I would think that item 2 above and the behavior on later opening the console are both slightly odd.

Also, if I bump the iteration count to 10000 I get a nice UI freeze while the 10000 items scroll into view.
I'll take a look in a profiler at both what's involved in those 52us and the UI freeze with 10000 iterations.  Up to you what you think of item 2 above.
Thank you Boris. I am looking forward to your analysis and suggestions for improvements.

I believe there's quite some room to improve things as mentioned in the bug where I submitted the patch. Big workloads like 10000 calls are still a problem, but we are no longer slower than the competition - I hope.

In bug 722685 comment 11 I mentioned we need to avoid creating the message nodes - I believe that would improve things a lot.

Also, this timer thing - displaying a few messages in a queue, every few ms, is a bit "weird". There's a balance to strike: if I increase the delays I allow the queue to fill up and avoid displaying 10000 messages - just display the last 200 messages. However, if we have delays too long output feels laggy for lower workloads. Not sure how we can better balance things. Any thoughts on this?
Alright, so a few things.

The 50ms is in large part xpconnect overhead:
  * 1/4 of total time is just JS-wrapping all the nsIStackFrame objects when we repeatedly
    walk the .caller chain.
  * Various security checks
  * DefinePropertyIfFound for each prop on each stackframe (because HasProto() is false)

All told, that stuff adds up to 50us or so per log() call...  I wonder whether we can make this better somehow.  Can we move stack frames over to webidl?  ;)

The freeze with 10000 calls is more web console code itself.  I'm seeing a ton of insertBefore _and_ removeChild in there, as well as layout flushes.  Are we just going through inserting all the things and then removing earlier ones when we insert later ones?  If so, can we optimize away part of that?  So if we know we allow 200 messages max, but we have 1000 in the queue, just drop the first 800 from the queue?

> displaying a few messages in a queue, every few ms, is a bit "weird".

I think the real issue I'm having is that we know the queue has 1000 messages, but we still display only a few at a time.  This makes the total wall-clock time to get to the last message much higher...  Modulo possible jank issues, it seems the optimal behavior is:
  1) Timer fires
  2) Trim all but the last 200 messages from the queue
  3) Insert all 200 nodes without doing any layout flushes
(In reply to Boris Zbarsky (:bz) from comment #11)
> Alright, so a few things.
> 
> The 50ms is in large part xpconnect overhead:
>   * 1/4 of total time is just JS-wrapping all the nsIStackFrame objects when
> we repeatedly
>     walk the .caller chain.
>   * Various security checks
>   * DefinePropertyIfFound for each prop on each stackframe (because
> HasProto() is false)
> 
> All told, that stuff adds up to 50us or so per log() call...  I wonder
> whether we can make this better somehow.  Can we move stack frames over to
> webidl?  ;)

I also noticed this. log() calls are very much slowed down by the code that reads the Components.stack frames. One of the things I did is to limit the depth of walking the stack - until I find the youngest JS frame of interest to the Web Console. This is still slowish. It would be really good if Gecko would be faster natively at giving us the stack frames.


> The freeze with 10000 calls is more web console code itself.  I'm seeing a
> ton of insertBefore _and_ removeChild in there, as well as layout flushes. 
> Are we just going through inserting all the things and then removing earlier
> ones when we insert later ones?  If so, can we optimize away part of that? 
> So if we know we allow 200 messages max, but we have 1000 in the queue, just
> drop the first 800 from the queue?

We do this - we prune the queue, but what can happen is that not all of the 10000 calls have arrived yet, so if only 1500 arrived by when the timer fires, we prune the queue then and start displaying, then later we prune again and so on.

Not sure how I can avoid this problem?


> > displaying a few messages in a queue, every few ms, is a bit "weird".
> 
> I think the real issue I'm having is that we know the queue has 1000
> messages, but we still display only a few at a time.  This makes the total
> wall-clock time to get to the last message much higher...  Modulo possible
> jank issues, it seems the optimal behavior is:
>   1) Timer fires
>   2) Trim all but the last 200 messages from the queue
>   3) Insert all 200 nodes without doing any layout flushes

Ok, good point. So you believe it's much faster for us to just display all of the 200 nodes at once, not split them into smaller updates. Thanks! Will look into doing this for bug 761257.

Thank you for your analysis! This is very useful for us.
Blocks: 761257
> It would be really good if Gecko would be faster natively at giving us the stack frames.

Indeed.  Worth filing a separate bug on this...

> but what can happen is that not all of the 10000 calls have arrived yet

That shouldn't be the case here, right?  I'm just doing 10000 calls to log() in a loop; just like comment 0 but with a bigger loop counter.  They should all have arrived, I would think.

> So you believe it's much faster for us to just display all of the 200 nodes at once,
> not split them into smaller updates. 

If you only flush layout once for the 200 nodes, yes.  Faster in terms of overall throughput.  Responsiveness may be worse....
(In reply to Boris Zbarsky (:bz) from comment #13)
> > It would be really good if Gecko would be faster natively at giving us the stack frames.
> 
> Indeed.  Worth filing a separate bug on this...

Do you want to file this bug or shall I do that? What component?


> > but what can happen is that not all of the 10000 calls have arrived yet
> 
> That shouldn't be the case here, right?  I'm just doing 10000 calls to log()
> in a loop; just like comment 0 but with a bigger loop counter.  They should
> all have arrived, I would think.

This happens because messages arrive async to the Web Console as follows:

1. the console API avoids processing the console API calls, and returns as quickly as possible to the content script. An nsITimer is used to flush the queue of logged console API calls and notify the observers (in this case the Web Console) of the console API method calls that happened.

2. the Web Console content script receives the notification from the console API code, processes the event and sends the information via the message manager to the Web Console chrome script - to the UI. Again this is async.

3. then Web Console UI code displays the information coming from the Web Console content script. This is also happening async, in batches of N messages every M milliseconds.

By the time of 3) the page script can do more calls to the console API - and it does so, because it's all in the loop.

If I do setTimeout(flushMyMessagesQueue, 50) in the Web Console code when the first message arrives, when flushMyMessagesQueue is invoked (after ~50ms) I do not have my queue full with all of the 10000 calls.

> > So you believe it's much faster for us to just display all of the 200 nodes at once,
> > not split them into smaller updates. 
> 
> If you only flush layout once for the 200 nodes, yes.  Faster in terms of
> overall throughput.  Responsiveness may be worse....

Will see how this goes then. Thanks!
> Do you want to file this bug or shall I do that? What component?

Go for it.  XPConnect.

> By the time of 3) the page script can do more calls to the console API

Not quite.  More precisely, in this case, none of the async code runs until _all_ of the web content here is done running (modulo e10s).

So what we have after the web content is done running is tons of events in flight, right?  And those are then delivered to the web console UI.  It sounds like the web console UI then processes these in batches of N messages.  How is that part done?  I can see two implementation strategies, both using this function:

  function process(msgs) {
    // do stuff
    msgs.length = 0;
  }

Strategy one (in pseudo-JS):

  function receiveMessage(msg) {
    msgs[msgs.length] = msg;
    if (msgs.length == N) {
      process(msgs);
    }
  }

Strategy two:

  function receiveMessage(msg) {
    msgs[msgs.length] = msg;
    if (msgs.length == N) {
      setTimeout(process, 0, msgs);
    }
  }

which one is the Web Console doing?  Or is it doing something else?

> If I do setTimeout(flushMyMessagesQueue, 50) in the Web Console code when the first
> message arrives, when flushMyMessagesQueue is invoked (after ~50ms) I do not have my
> queue full with all of the 10000 calls.

That's ... very odd.
(In reply to Boris Zbarsky (:bz) from comment #15)
> > Do you want to file this bug or shall I do that? What component?
> 
> Go for it.  XPConnect.
> 
> > By the time of 3) the page script can do more calls to the console API
> 
> Not quite.  More precisely, in this case, none of the async code runs until
> _all_ of the web content here is done running (modulo e10s).
> 
> So what we have after the web content is done running is tons of events in
> flight, right?  And those are then delivered to the web console UI.  It
> sounds like the web console UI then processes these in batches of N
> messages.  How is that part done?  I can see two implementation strategies,
> both using this function:
> 
>   function process(msgs) {
>     // do stuff
>     msgs.length = 0;
>   }
> 
> Strategy one (in pseudo-JS):
> 
>   function receiveMessage(msg) {
>     msgs[msgs.length] = msg;
>     if (msgs.length == N) {
>       process(msgs);
>     }
>   }
> 
> Strategy two:
> 
>   function receiveMessage(msg) {
>     msgs[msgs.length] = msg;
>     if (msgs.length == N) {
>       setTimeout(process, 0, msgs);
>     }
>   }
> 
> which one is the Web Console doing?  Or is it doing something else?

What the Web Console does now:

http://mxr.mozilla.org/mozilla-central/source/browser/devtools/webconsole/HUDService.jsm#2492

We basically queue every message and try to output it instantly, but when the queue fills up with too many messages (MESSAGES_IN_INTERVAL) we start delaying updates with setTimeout().

I'm sure this code can be improved.
Thanks for the link.  So what this is doing is more like:

  function receiveMessage(msg) {
    if (been a while) {
      throttleValue = ESSAGES_IN_INTERVAL;
    }
    msgs[msgs.length] = msg;
    toDisplay = min(msgs.length, throttleValue);
    if (toDisplay == 0) {
      // Don't do anything now; just call back into this code sometime later
      return;
    }
    throttleValue -= toDisplay;
    // process the first toDisplay messages and post a timer to deal with the rest later
  }

so I would expect that the first MESSAGES_IN_INTERVAL calls to this we would process things immediately, then we would start queueing up stuff... but by the time we're called off one of those timeouts we should be have all 10000 messages.
(In reply to Boris Zbarsky (:bz) from comment #17)
> Thanks for the link.  So what this is doing is more like:
> 
>   function receiveMessage(msg) {
>     if (been a while) {
>       throttleValue = ESSAGES_IN_INTERVAL;
>     }
>     msgs[msgs.length] = msg;
>     toDisplay = min(msgs.length, throttleValue);
>     if (toDisplay == 0) {
>       // Don't do anything now; just call back into this code sometime later
>       return;
>     }
>     throttleValue -= toDisplay;
>     // process the first toDisplay messages and post a timer to deal with
> the rest later
>   }
> 
> so I would expect that the first MESSAGES_IN_INTERVAL calls to this we would
> process things immediately, then we would start queueing up stuff... but by
> the time we're called off one of those timeouts we should be have all 10000
> messages.

Yep, but this is not what I was seeing - had dump() calls for most of the places to see how queuing happens. Now I am working on some changes to the code and I'll submit a patch when I have something interesting to show.
Depends on: 762995
Should we continune to keep this bug open? Can someone please retest and let us know if there are any specific performance issues that still need to be addressed? Thank you!
Might as well close this and file any other issues as separate bugs.
Status: NEW → RESOLVED
Closed: 12 years ago
Resolution: --- → WORKSFORME
Product: Firefox → DevTools
You need to log in before you can comment on or make changes to this bug.