Closed Bug 650189 Opened 14 years ago Closed 14 years ago

Recent performance regression in loading huge tinderbox logs

Categories

(Core :: Layout: Text and Fonts, defect)

x86
macOS
defect
Not set
normal

Tracking

()

RESOLVED FIXED
Tracking Status
firefox5 + fixed

People

(Reporter: ehsan.akhgari, Assigned: smontagu)

References

()

Details

(Keywords: dogfood, perf, regression)

Attachments

(1 file)

See the URL for example (warning: it might kill your browser). I've recently (the past couple of weeks maybe) am seeing a huge regression in loading these pages. I leave Firefox for like 5 minutes and then kill it, because it's absolutely unresponsive. I don't have an --enable-profiling opt build to profile this with right now, but shark on our (useless for profiling) nightlies shows me some nsTextFrame stuff. This is the worst regression I've ever seen. Can somebody profile please?
Simon is the most likely culprit :-)
Assignee: nobody → smontagu
I was loading merrily tinderbox logs on a nightly no older than a couple of days.
> but shark on our (useless for profiling) nightlies http://ftp.mozilla.org/pub/mozilla.org/firefox/nightly/latest-mozilla-central/firefox-6.0a1.en-US.mac-shark.dmg tends to have a useful for profiling nightly. I'll take a look if someone doesn't beat me to it, but I suspect regression range may be more useful than profile.
Please double check the following range. Regression window(m-c hourly): Works: http://hg.mozilla.org/mozilla-central/rev/d114ab94a859 Mozilla/5.0 (Windows NT 6.1; WOW64; rv:2.2a1pre) Gecko/20110411 Firefox/4.2a1pre ID:20110411100945 Broken: http://hg.mozilla.org/mozilla-central/rev/f65b79eeabd4 Mozilla/5.0 (Windows NT 6.1; WOW64; rv:2.2a1pre) Gecko/20110411 Firefox/4.2a1pre ID:20110411101509 Pushlog: http://hg.mozilla.org/mozilla-central/pushloghtml?fromchange=d114ab94a859&tochange=f65b79eeabd4 Regression window(cedar hourly): works: http://hg.mozilla.org/projects/cedar/rev/db98c4f46347 Mozilla/5.0 (Windows NT 6.1; WOW64; rv:2.2a1pre) Gecko/20110409 Firefox/4.2a1pre ID:20110410231038 Broken: http://hg.mozilla.org/projects/cedar/rev/2ad43389d244 Mozilla/5.0 (Windows NT 6.1; WOW64; rv:2.2a1pre) Gecko/20110409 Firefox/4.2a1pre ID:20110411010443 Pushlog: http://hg.mozilla.org/projects/cedar//pushloghtml?fromchange=db98c4f46347&tochange=2ad43389d244
(In reply to comment #4) > Regression window(cedar hourly): > Pushlog: > http://hg.mozilla.org/projects/cedar//pushloghtml?fromchange=db98c4f46347&tochange=2ad43389d244 Which is smontagu, as roc suspected. thanks alice
(In reply to comment #5) > (In reply to comment #4) > > Regression window(cedar hourly): > > Pushlog: > > http://hg.mozilla.org/projects/cedar//pushloghtml?fromchange=db98c4f46347&tochange=2ad43389d244 > > Which is smontagu, as roc suspected. thanks alice Yeah, makes sense to me. We should investigate to see if this problem can show up in other web pages, and back out bug 263359 from aurora.
Blocks: 263359
Very strange, because all my testing showed a huge improvement in loading tinderbox logs from bug 263359. Why is this tinderbox log different from all other tinderbox logs?
Ah, I was forgetting that bug 646359 got split out from bug 263359. Testing with the patch from there applied makes the test URL load speedily for me (a local copy of it, that is). I'm still surprised that bug 263359 made things significantly worse, though.
Depends on: 646359
On second thoughts, testing with the local copy doesn't prove anything because it gets decoded as Latin1
Bug 646359 doesn't help with this.
No longer depends on: 646359
So this is basically the same old problem with incremental bidi resolution in long paragraphs that we have talked about in the past, e.g. https://bugzilla.mozilla.org/show_bug.cgi?id=566066#c2. Bug 263359, which was supposed to help with that, was a performance win with smaller files, but it seems that there is a critical mass above which doing bidi resolution on each line separately takes longer than doing it on the whole block. FTR, when I saw performance improvements I was testing with tinderbox logs around 10MB, 85K lines; the log in the URL is just under 32MB, 146K lines.
Does that mean that we can't effectively get good perf in both cases (long lines and large blocks)?
I'm sure there are other things we can do. One idea that I want to investigate is a NS_LINE_NEEDS_BIDI_RESOLUTION flag for preformatted elements, similar to the NS_BLOCK_NEEDS_BIDI_RESOLUTION that we already use.
Why is doing bidi resolution on each line separately a *huge* regression from doing it on the whole block? That sounds surprising to me. Are we sure that's all that's going on here?
Why are we nominating this for Aurora? What are the risks?
Will track, we need to get to a solid root cause analysis and find out if backout or something else is a good strategy.
(In reply to comment #15) > Why are we nominating this for Aurora? What are the risks? We know that the patches landed in bug 263359 (which is on Aurora) have caused huge performance regressions for at least Tinderbox log pages. We don't know whether this regression can also be observed on other (regular) web pages or not. We need to track this on Aurora, to make sure that we back out bug 263359 if we need to.
Attached patch w-i-p (deleted) — Splinter Review
It's cool when working on one bug provides illumination for another one... At least part of this was caused by moving the initialization of the nsBlockInFlowLineIterator from Resolve() into ResolveParagraph() in http://hg.mozilla.org/mozilla-central/rev/ad5009233ae9. That can't be the whole story though, because it doesn't fit the regression range, and wasn't checked in to Aurora. This patch also regresses bug 650189 :(
(In reply to comment #18) > This patch also regresses bug 650189 :( This is bug 650189.
The patch regress *bug 645193*
I think we should do a backout to fix this, both on Aurora and mozilla-central.
Can someone double check whether the regression exists in Aurora before backing out there? I'm still seeing improvement rather than regression in Aurora compared to FF4, but my network connection speed is very irregular and I don't know how reliable my data are.
It should be possible to simulate a testcase with a script loop that document.write()s chunks of text and then calls document.body.offsetHeight to flush reflow periodically.
Using the link in the URL field, I can reproduce the exact regression on Aurora as well. Things *might* be different under slow connections, in which case roc's suggestion in comment 23 might help, but I think that we have enough information now to assert that this regression exists both on mozilla-central and Aurora.
Since there is no proposed fix, please backout as per #21 as soon as possible.
(In reply to comment #25) > Since there is no proposed fix, please backout as per #21 as soon as possible. Simon, can you please take care of this? I'm not sure which patches to back out exactly.
http://hg.mozilla.org/mozilla-central/rev/fa0295a97f1b Ehsan, can you confirm that the backout fixes the regression?
(In reply to comment #28) > http://hg.mozilla.org/mozilla-central/rev/fa0295a97f1b > > Ehsan, can you confirm that the backout fixes the regression? It does, yes. Thanks!
Status: NEW → RESOLVED
Closed: 14 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: