Open Bug 1356686 Opened 8 years ago Updated 1 year ago

OMT brotli decompression

Categories

(Core :: Networking: HTTP, enhancement, P3)

enhancement

Tracking

()

Performance Impact high

People

(Reporter: ehsan.akhgari, Unassigned)

References

(Depends on 1 open bug, Blocks 1 open bug)

Details

(4 keywords, Whiteboard: [necko-triaged])

Attachments

(1 obsolete file)

See this profile for example: <https://perfht.ml/2og8OSq> I captured it by loading the facebook homepage. We spend 53ms on the main thread decompressing the brotli stream. I bet this is a lot worse on a slow machine. It seems we shouldn't be running decompression code on the target thread of the Necko data delivery. Even if the target thread isn't the main thread (for example for the HTML parser) it seems that blocking HTML parsing for data decompression is a pretty bad idea if the machine does have free cores to run more than one thing concurrently.
Marking as [qf:investigate:p1] because it's premature to assume that simply moving this OMT is going to make things faster, of course.
Whiteboard: [qf:p1] → [qf:investigate:p1]
BTW I have another profile as well: https://perfht.ml/2ofWYYD This is super easy to reproduce, FTR.
Whiteboard: [qf:investigate:p1] → [qf:investigate:p1][necko-next]
nsHTTPCompressConv doesn't implement nsIThreadRetargetableStreamListener so OMT is disabled if decompression is required. The first thing to try is to reenable OMT for this scenario, i.e. move the decompression overhead to parser thread instead of blocking main thread. I am not sure if moving decompression to another 'converter' thread will further improve the performance, since the HTML parser will still be waiting for the decompressed data. The page loading and rendering for HTML will still be blocked anyway. However, this will benefit the scenarios that OMT is disabled by other reason. It'll need further experiment to understand the benefit.
I'm not confident this will end up helping a lot either, but there are cases where we are fetching multiple compressed resources in parallel (js, css, html all commonly compressed) - so helper threads could at least give some parallelism.. but honestly this processing happens pretty fast so it might not be a win after the overhead.
FWIW I have seen this take 1-5 frames quite regularly in various profiles over the past few months, for example on Facebook. The example profile in bug 1367666 reminded me to comment here. It would really be nice if we fixed this, this really does show up a lot on real sites.
After bug 1357678 landed, part of the time we spent on brotli decompression is off-main-thread now (for example HTML parsing and image decoding). The reset of brotli decompression loading on main thread is mainly for loading JS and CSS. Bug 1380218 and bug 1381425 were filed previously for further OMT JS/CSS loading.
Priority: -- → P2
Keywords: perf
Dragana - is this something you can tackle?
Flags: needinfo?(dd.mozilla)

Junior - is this something you could investigate?

Flags: needinfo?(juhsu)
Flags: needinfo?(dd.mozilla)

(In reply to Selena Deckelmann :selenamarie :selena use ni? pronoun: she from comment #9)

Junior - is this something you could investigate?

Given Comment 4 and Comment 6, IMO it could be a P3 for now. It could be my backlog but I'm not familiar with the pb-Http setup.

Flags: needinfo?(juhsu)
Priority: P2 → P3
Whiteboard: [qf:investigate:p1][necko-next] → [qf:investigate:p1][necko-triaged]

QF triage team has reviewed this bug and it seems like a pageload issue.

Whiteboard: [qf:investigate:p1][necko-triaged] → [qf:p1:pageload][necko-triaged]

I think moving decompression off the main thread would help quite a bit.
Here's a more recent profile with 175ms spent inside nsHTTPCompressConv::OnDataAvailable: http://bit.ly/2ShNR8P
This is from loading https://www.facebook.com/DwayneJohnson .

Generally, the main thread is always busy during page load; the more work we can move off of it, the better. And things that are already asynchronous from the main thread's perspective (such as network requests) are a particularly attractive target.

Assignee: nobody → brennie
Status: NEW → ASSIGNED

Hi :dragana,

What all is required to accomplish this? Is it sufficient to read the data from the HTTP stream into a buffer and then pass that off to a thread to do the decoding?

Flags: needinfo?(dd.mozilla)

(In reply to Barret Rennie [:brennie] from comment #13)

Hi :dragana,

What all is required to accomplish this? Is it sufficient to read the data from the HTTP stream into a buffer and then pass that off to a thread to do the decoding?

we have away to deliver onDataAvailable off-mail-thread, but all listeners must agree.
For listeners that require onDataAvailable to be deliver on the main thread but are encoded to do the following:

  1. deliver onDataAvailabe to the decoder, to nsHTTPCompressConv, of the main thread.
  2. the decoder should dispatch decoded data to the main thread.

we need to be careful to properly cancel data delivery from the decoder if channel gets canceled.

I think this should work.

Flags: needinfo?(dd.mozilla)

So I ran some try runs of the patch I posted above (and some variants) to do some try runs and here are the results:

Always move off-main-thread, all decompression
This is the current version of the patch posted for review. If the listener chain for nsHTTPCompessConv doesn't support off-main-thread OnDataAvailable, nsHTTPCompressConv::OnDataAvailable will dispatch a task to call its listener on the main thread.
Regresses the following:

  • raptor-tp6-imgur-firefox opt 4.06%
  • raptor-tp6-instagram-firefox opt 27.92%
  • raptor-tp6-linkedin-firefox opt 2%
  • raptor-tp6-sheets-firefox opt 3%
  • raptor-tp6-yahoo-news-firefox opt 2.62%

Only move off-main-thread when entire listener chain supports, all decompression
Regresses the following:

  • raptor-tp6-imgur-firefox opt 5.18%
  • raptor-tp6-instagram-firefox opt 28.44%
    Improves the following:
  • raptor-tp6-wikipedia-firefox opt 2.28%
    Improves the following:
  • raptor-tp6-google-firefox opt 3.10%

Always move off-main-thread, only Brotli
Like the patch posted for review, this will always re-target off-the-main thread and dispatch the listener back to the main thread when it doesn't support re-targeting.
Regresses the following:
raptor-tp6-facebook-firefox-cold opt 33.20%

Improves the following:

  • raptor-tp6-google-firefox opt 3.15%
  • raptor-tp6-reddit-firefox opt 3.48%
  • raptor-tp6-youtube-firefox opt 3.10%

Only move off-main-thread when entire listener chain supports, only Brotli
Regresses the following:

  • raptor-tp6-facebook-firefox-cold opt 49.73%
  • raptor-tp6-imgur-firefox opt 4.44%
  • raptor-tp6-instagram-firefox opt 30.12%
  • raptor-tp6-linkedin-firefox opt 2.21%
Attachment #9068169 - Attachment is obsolete: true
Assignee: brennie → nobody
Status: ASSIGNED → NEW

I'm going to get this compiling again and re-evaluate it with our latest performance tests which include features like visual metrics.

Current nightly still seeing about 65ms in the content process during facebook pageload on a MacBook Pro.
https://share.firefox.dev/3eyoWLl

I've gotten the patches compiling again.

Nazim was kind enough to capture new profiles with these patches:

Baseline, 109ms of Brotli decompression on content process main thread:
https://share.firefox.dev/3ny3zhd

With Brotli moved omt: now only 4ms of Brotli decompression on content process main thread:
https://share.firefox.dev/3sXGHZx

Awaiting visual metric runs on try.

We have some interesting results from visual metrics.

I only rebuilt two of the variations from Comment 16
Always move off-main-thread, all decompression

While this regresses many sites, there are also numerous improvements of up to 50% in visual metrics.

Always move off-main-thread, only Brotli decompression

This regresses more sites than it improves, which is interesting.

I think a good next step would be to compare these variations on larger set of pages, ideally using live sites.

Performance Impact: --- → P1
Keywords: perf:pageload
Whiteboard: [qf:p1:pageload][necko-triaged] → [necko-triaged]

The Performance Priority Calculator has determined this bug's performance priority to be P1.

Platforms: macOS
Impact on site: Causes noticeable jank
Page load impact: Some
[x] Affects major website
[x] Able to reproduce locally

While this would be useful right now, to my understanding it will no longer be needed once we can move the network calls off the content process main thread, Bug 1639433.

Severity: normal → S3

The severity field for this bug is set to S3. However, the Performance Impact field flags this bug as having a high impact on the performance.
:jesup, could you consider increasing the severity of this performance-impacting bug? Alternatively, if you think the performance impact is lower than previously assessed, could you request a re-triage from the performance team by setting the Performance Impact flag to ??

For more information, please visit auto_nag documentation.

Flags: needinfo?(rjesup)
Depends on: omt-content
Flags: needinfo?(rjesup)
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: