1356686 - OMT brotli decompression

Reporter

Description

•

8 years ago

See this profile for example: <https://perfht.ml/2og8OSq> I captured it by loading the facebook homepage. We spend 53ms on the main thread decompressing the brotli stream. I bet this is a lot worse on a slow machine. It seems we shouldn't be running decompression code on the target thread of the Necko data delivery. Even if the target thread isn't the main thread (for example for the HTML parser) it seems that blocking HTML parsing for data decompression is a pretty bad idea if the machine does have free cores to run more than one thing concurrently.

(no longer active)

Reporter

Comment 1

•

8 years ago

Marking as [qf:investigate:p1] because it's premature to assume that simply moving this OMT is going to make things faster, of course.

Whiteboard: [qf:p1] → [qf:investigate:p1]

(no longer active)

Reporter

Comment 2

•

8 years ago

BTW I have another profile as well: https://perfht.ml/2ofWYYD This is super easy to reproduce, FTR.

Dragana Damjanovic [:dragana]

Updated

•

8 years ago

Whiteboard: [qf:investigate:p1] → [qf:investigate:p1][necko-next]

Shih-Chiang Chien [:schien] (UTC+8) (use ni? plz)

Comment 3

•

8 years ago

nsHTTPCompressConv doesn't implement nsIThreadRetargetableStreamListener so OMT is disabled if decompression is required. The first thing to try is to reenable OMT for this scenario, i.e. move the decompression overhead to parser thread instead of blocking main thread. I am not sure if moving decompression to another 'converter' thread will further improve the performance, since the HTML parser will still be waiting for the decompressed data. The page loading and rendering for HTML will still be blocked anyway. However, this will benefit the scenarios that OMT is disabled by other reason. It'll need further experiment to understand the benefit.

Patrick McManus [:mcmanus]

Comment 4

•

8 years ago

I'm not confident this will end up helping a lot either, but there are cases where we are fetching multiple compressed resources in parallel (js, css, html all commonly compressed) - so helper threads could at least give some parallelism.. but honestly this processing happens pretty fast so it might not be a win after the overhead.

(no longer active)

Reporter

Comment 5

•

7 years ago

FWIW I have seen this take 1-5 frames quite regularly in various profiles over the past few months, for example on Facebook. The example profile in bug 1367666 reminded me to comment here. It would really be nice if we fixed this, this really does show up a lot on real sites.

Shih-Chiang Chien [:schien] (UTC+8) (use ni? plz)

Comment 6

•

7 years ago

After bug 1357678 landed, part of the time we spent on brotli decompression is off-main-thread now (for example HTML parsing and image decoding). The reset of brotli decompression loading on main thread is mainly for loading JS and CSS. Bug 1380218 and bug 1381425 were filed previously for further OMT JS/CSS loading.

Firefox Bug Husbandry Bot

Comment 7

•

7 years ago

Bulk change to priority: https://bugzilla.mozilla.org/show_bug.cgi?id=1399258

Priority: -- → P2

Firefox Bug Husbandry Bot

Updated

•

7 years ago

Keywords: perf

Selena Deckelmann :selenamarie :selena

Comment 8

•

6 years ago

Dragana - is this something you can tackle?

Flags: needinfo?(dd.mozilla)

Selena Deckelmann :selenamarie :selena

Comment 9

•

6 years ago

Junior - is this something you could investigate?

Flags: needinfo?(juhsu)

Selena Deckelmann :selenamarie :selena

Updated

•

6 years ago

Flags: needinfo?(dd.mozilla)

Junior [inactive]

Comment 10

•

6 years ago

(In reply to Selena Deckelmann :selenamarie :selena use ni? pronoun: she from comment #9)

Junior - is this something you could investigate?

Given Comment 4 and Comment 6, IMO it could be a P3 for now. It could be my backlog but I'm not familiar with the pb-Http setup.

Flags: needinfo?(juhsu)

Priority: P2 → P3

Whiteboard: [qf:investigate:p1][necko-next] → [qf:investigate:p1][necko-triaged]

Jean Gong :jgong

Comment 11

•

6 years ago

QF triage team has reviewed this bug and it seems like a pageload issue.

Whiteboard: [qf:investigate:p1][necko-triaged] → [qf:p1:pageload][necko-triaged]

Markus Stange [:mstange]

Comment 12

•

6 years ago

I think moving decompression off the main thread would help quite a bit.
Here's a more recent profile with 175ms spent inside nsHTTPCompressConv::OnDataAvailable: http://bit.ly/2ShNR8P
This is from loading https://www.facebook.com/DwayneJohnson .

Generally, the main thread is always busy during page load; the more work we can move off of it, the better. And things that are already asynchronous from the main thread's perspective (such as network requests) are a particularly attractive target.

Barret Rennie [:barret] (they/them)

Updated

•

6 years ago

Assignee: nobody → brennie

Status: NEW → ASSIGNED

Barret Rennie [:barret] (they/them)

Comment 13

•

6 years ago

Hi :dragana,

What all is required to accomplish this? Is it sufficient to read the data from the HTTP stream into a buffer and then pass that off to a thread to do the decoding?

Flags: needinfo?(dd.mozilla)

Dragana Damjanovic [:dragana]

Comment 14

•

6 years ago

(In reply to Barret Rennie [:brennie] from comment #13)

Hi :dragana,

What all is required to accomplish this? Is it sufficient to read the data from the HTTP stream into a buffer and then pass that off to a thread to do the decoding?

we have away to deliver onDataAvailable off-mail-thread, but all listeners must agree.
For listeners that require onDataAvailable to be deliver on the main thread but are encoded to do the following:

deliver onDataAvailabe to the decoder, to nsHTTPCompressConv, of the main thread.
the decoder should dispatch decoded data to the main thread.

we need to be careful to properly cancel data delivery from the decoder if channel gets canceled.

I think this should work.

Flags: needinfo?(dd.mozilla)

Barret Rennie [:barret] (they/them)

Comment 15

•

6 years ago

Attached file Bug 1356686 - Do decompression off main thread in content r?dragana,mayhemer (obsolete) (deleted) — Details

Barret Rennie [:barret] (they/them)

Comment 16

•

5 years ago

So I ran some try runs of the patch I posted above (and some variants) to do some try runs and here are the results:

Always move off-main-thread, all decompression
This is the current version of the patch posted for review. If the listener chain for nsHTTPCompessConv doesn't support off-main-thread OnDataAvailable, nsHTTPCompressConv::OnDataAvailable will dispatch a task to call its listener on the main thread.
Regresses the following:

raptor-tp6-imgur-firefox opt 4.06%
raptor-tp6-instagram-firefox opt 27.92%
raptor-tp6-linkedin-firefox opt 2%
raptor-tp6-sheets-firefox opt 3%
raptor-tp6-yahoo-news-firefox opt 2.62%

Only move off-main-thread when entire listener chain supports, all decompression
Regresses the following:

raptor-tp6-imgur-firefox opt 5.18%
raptor-tp6-instagram-firefox opt 28.44%
Improves the following:
raptor-tp6-wikipedia-firefox opt 2.28%
Improves the following:
raptor-tp6-google-firefox opt 3.10%

Always move off-main-thread, only Brotli
Like the patch posted for review, this will always re-target off-the-main thread and dispatch the listener back to the main thread when it doesn't support re-targeting.
Regresses the following:
raptor-tp6-facebook-firefox-cold opt 33.20%

Improves the following:

raptor-tp6-google-firefox opt 3.15%
raptor-tp6-reddit-firefox opt 3.48%
raptor-tp6-youtube-firefox opt 3.10%

Only move off-main-thread when entire listener chain supports, only Brotli
Regresses the following:

raptor-tp6-facebook-firefox-cold opt 49.73%
raptor-tp6-imgur-firefox opt 4.44%
raptor-tp6-instagram-firefox opt 30.12%
raptor-tp6-linkedin-firefox opt 2.21%

Phabricator Automation

Updated

•

4 years ago

Attachment #9068169 - Attachment is obsolete: true

Barret Rennie [:barret] (they/them)

Updated

•

4 years ago

Assignee: brennie → nobody

Status: ASSIGNED → NEW

Andrew Creskey [:acreskey]

Comment 17

•

4 years ago

I'm going to get this compiling again and re-evaluate it with our latest performance tests which include features like visual metrics.

Andrew Creskey [:acreskey]

Comment 18

•

4 years ago

Current nightly still seeing about 65ms in the content process during facebook pageload on a MacBook Pro.
https://share.firefox.dev/3eyoWLl

Andrew Creskey [:acreskey]

Comment 19

•

4 years ago

I've gotten the patches compiling again.

Nazim was kind enough to capture new profiles with these patches:

Baseline, 109ms of Brotli decompression on content process main thread:
https://share.firefox.dev/3ny3zhd

With Brotli moved omt: now only 4ms of Brotli decompression on content process main thread:
https://share.firefox.dev/3sXGHZx

Awaiting visual metric runs on try.

Andrew Creskey [:acreskey]

Comment 20

•

4 years ago

We have some interesting results from visual metrics.

I only rebuilt two of the variations from Comment 16
Always move off-main-thread, all decompression

While this regresses many sites, there are also numerous improvements of up to 50% in visual metrics.

Always move off-main-thread, only Brotli decompression

This regresses more sites than it improves, which is interesting.

Andrew Creskey [:acreskey]

Comment 21

•

4 years ago

I think a good next step would be to compare these variations on larger set of pages, ideally using live sites.

Dave Hunt [:davehunt] [he/him] ⌚BST

Updated

•

3 years ago

Performance Impact: --- → P1

Keywords: perf:pageload

Whiteboard: [qf:p1:pageload][necko-triaged] → [necko-triaged]

Frank Doty [:fdoty|on_leave]

Comment 22

•

2 years ago

The Performance Priority Calculator has determined this bug's performance priority to be P1.

Platforms: macOS
Impact on site: Causes noticeable jank
Page load impact: Some
[x] Affects major website
[x] Able to reproduce locally

Keywords: perf → perf:responsiveness, reproducible, top50

Andrew Creskey [:acreskey]

Comment 23

•

2 years ago

While this would be useful right now, to my understanding it will no longer be needed once we can move the network calls off the content process main thread, Bug 1639433.

BMO Automation

Updated

•

2 years ago

Severity: normal → S3

BugBot [:suhaib / :marco/ :calixte]

Comment 24

•

2 years ago

The severity field for this bug is set to S3. However, the Performance Impact field flags this bug as having a high impact on the performance.
:jesup, could you consider increasing the severity of this performance-impacting bug? Alternatively, if you think the performance impact is lower than previously assessed, could you request a re-triage from the performance team by setting the Performance Impact flag to ??

For more information, please visit auto_nag documentation.

Flags: needinfo?(rjesup)

Greg Hess

Updated

•

2 years ago

Depends on: omt-content

Greg Hess

Updated

•

2 years ago

Flags: needinfo?(rjesup)

Mayank Bansal

Updated

•

1 year ago

See Also: → https://bugzilla.mozilla.org/show_bug.cgi?id=1851351