Open Bug 1695530 Opened 3 years ago Updated 2 years ago

Firefox endlessly spins for XML (59MB, ~180,000 lines) but renders from 81.0.2 backwards

Categories

(Core :: Layout: Generated Content, Lists, and Counters, defect, P4)

Firefox 82
x86_64
Linux
defect

Tracking

()

People

(Reporter: all.along.the.watchtower2001, Unassigned)

References

(Regression)

Details

(Keywords: leave-open, perf, regression)

Attachments

(2 files)

Attached file 2021-02-28-firefox-xml-test.zip (deleted) —

User Agent: Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:85.0) Gecko/20100101 Firefox/85.0

Steps to reproduce:

Noticed a change in behavior between Firefox 81.0.2 (and backwards) up to current nightly 88.0a1 (2021-02-27) (64-bit) (so 82 -> 88) where large XML files are causing Firefox to endlessly spin up until a crash after a long period of time.

The problem was first noticed with a complex XML schema but I was able to create the attached mock with a fairly small number of branches which also simulates the behavior.

Test results are as follows:

Firefox 79: 50 seconds test xml.
Firefox 80: 1 min test xml.
Firefox 81: 1 minute test xml
Firefox 81.0.2: 1 minute test xml.
Firefox 82: Endless spinning <-- first version with issue.
Firefox 83: Endless spinning.
Firefox 84: Endless spinning.
Firefox 85: Endless spinning.
Firefox 86: Endless spinning.
88.0a1 (2021-02-27) (64-bit): Endless spinning.

The following (not very sophisticated) script was used to create slightly different sized mocks to find the right sized XML to test without crashing everything.

print("<?xml version='1.0' encoding='UTF-8'?>")
print("<book>")
for x in range(450000):
     print("   <summary>this is a summary line that is fairly long</summary>")
     print("   <chapter>")
     print("      <page>this is a nested line</page>")
     print("   </chapter>")
print("</book>")

If there is a way I can get more logging from Firefox to see where this is spinning on the back-end the advice is appreciated and I can send that along.

Closest Bugzilla ticket I have found is: https://bugzilla.mozilla.org/show_bug.cgi?id=1318272 but this seems to be an older legacy issue.

Actual results:

Firefox 81: Load the attached XML. It will render in ~1 minute.
Firefox 82->onwards: Load the attached XML: It will endlessly spin.

Expected results:

While it is acknowledged the XML is large - the expected behavior is to load and/or provide feedback on why the XML cannot be processed.

OS: Unspecified → Linux
Hardware: Unspecified → x86_64

Machine information:

  • 4.0 GHz i7-8550U (1.8 up to 4.0 GHz – 8MB Cache – 4 Cores – 8 Threads)
  • 32 GB Dual Channel DDR4 at 2400MHz (2× 16 GB)

The Bugbug bot thinks this bug should belong to the 'Core::Printing: Output' component, and is moving the bug to that component. Please revert this change in case you think the bot is wrong.

Component: Untriaged → Printing: Output
Product: Firefox → Core

If you could use https://mozilla.github.io/mozregression/ to find a regression range, that would be very helpful I imagine.

Seems like the bot got confused, doesn't seem related to printing. But yeah, a regression window is the best step forward here. Thanks for filing this!

Component: Printing: Output → General
QA Whiteboard: [qa-regression-triage]

thanks Robert. I didn't know about mozregression. It's pretty cool. I ran through the test twice to get used to it. The changeset looks to be identified as follows:

2021-03-01T08:45:26.355000: INFO : Narrowed integration regression window from [33b1ed31, edb1d37f] (3 builds) to [33b1ed31, 75559571] (2 builds) (~1 steps left)
2021-03-01T08:45:26.364000: DEBUG : Starting merge handling...
2021-03-01T08:45:26.364000: DEBUG : Using url: https://hg.mozilla.org/integration/autoland/json-pushes?changeset=75559571662961f502e2f424cc3b2d7572c50bb5&full=1
2021-03-01T08:45:26.365000: DEBUG : redo: attempt 1/3
2021-03-01T08:45:26.365000: DEBUG : redo: retry: calling _default_get with args: ('https://hg.mozilla.org/integration/autoland/json-pushes?changeset=75559571662961f502e2f424cc3b2d7572c50bb5&full=1',), kwargs: {}, attempt #1
2021-03-01T08:45:26.373000: DEBUG : urllib3.connectionpool: Resetting dropped connection: hg.mozilla.org
2021-03-01T08:45:27.528000: DEBUG : urllib3.connectionpool: https://hg.mozilla.org:443 "GET /integration/autoland/json-pushes?changeset=75559571662961f502e2f424cc3b2d7572c50bb5&full=1 HTTP/1.1" 200 None
2021-03-01T08:45:27.533000: DEBUG : Found commit message:
Bug 1548753 - [css-lists] Prefer an ancestor's counter scope over a sibling's scope.  r=emilio

https://github.com/w3c/csswg-drafts/issues/5477

Differential Revision: https://phabricator.services.mozilla.com/D87991

2021-03-01T08:45:27.533000: DEBUG : Did not find a branch, checking all integration branches
2021-03-01T08:45:27.538000: INFO : The bisection is done.
2021-03-01T08:45:27.539000: INFO : Stopped
Regressed by: 1548753
Has Regression Range: --- → yes
Component: General → Layout: Generated Content, Lists, and Counters

Mats, can you take a look?

Flags: needinfo?(mats)

The disclosure things in the XML source view are <summary> elements, so we get a list-item per "chapter". So we basically have one long flat list (~1M nodes) of something like this:

Counter Manager Lists:
Counter named "list-item":
  Node #0 @0x7f4146dddf40 frame=0x7f4146dee1a8 index=-1431655764 type=INCREMENT valAfter=0
       scope-start=(nil) scope-prev=(nil)
  Node #1 @0x7f4146da19e0 frame=0x7f4146dee270 index=0 type=USE valAfter=0
       scope-start=0x7f4146dddf40 scope-prev=0x7f4146dddf40 text=0
  Node #2 @0x7f4146dddfa0 frame=0x7f414634e020 index=-1431655764 type=INCREMENT valAfter=0
       scope-start=0x7f4146dddf40 scope-prev=0x7f4146da19e0
  Node #3 @0x7f4146da1ac0 frame=0x7f414634e0e8 index=0 type=USE valAfter=0
       scope-start=0x7f4146dddf40 scope-prev=0x7f4146dddfa0 text=0
  Node #4 @0x7f4146352040 frame=0x7f414634fa68 index=-1431655764 type=INCREMENT valAfter=0
       scope-start=0x7f4146dddf40 scope-prev=0x7f4146da1ac0
  Node #5 @0x7f4146da1b30 frame=0x7f414634fb30 index=0 type=USE valAfter=0
       scope-start=0x7f4146dddf40 scope-prev=0x7f4146352040 text=0
  Node #6 @0x7f41463520a0 frame=0x7f4146355560 index=-1431655764 type=INCREMENT valAfter=0
       scope-start=0x7f4146dddf40 scope-prev=0x7f4146da1b30
...

(Sigh. It's increasingly clear to me the decision that <summary> should have display:list-item just to implement the disclosure symbol was a huge mistake...)

Anyway, we need to optimize this for huge HTML lists too, but still...

Assignee: nobody → mats
Severity: -- → S4
Status: UNCONFIRMED → ASSIGNED
Ever confirmed: true
Flags: needinfo?(mats)
Keywords: perf
Priority: -- → P4

Also, fix Dump() so that it logs the right type for INCREMENT/SET nodes.

Keywords: leave-open
Pushed by mpalmgren@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/5ba8f4cf0a26
part 1 - Don't create an INCREMENT counter node when the increment is zero because it has no effect.  r=emilio

Note, bug 1719745 has some extremely-reduced testcases, via "view-source", which uses a list counter for source-line-numbering.

(1) testcase with 200,000 lines (view-source hangs my Firefox instance for longer than I care to wait, maybe forever):
https://bugzilla.mozilla.org/attachment.cgi?id=9234538

(2) testcase with 20,000 lines (view-source hangs my Firefox instance for 2-3 seconds):
https://bugzilla.mozilla.org/attachment.cgi?id=9234539

(Mats: given that you noted that your landed "part 1" patch should fix the problem completely for this bug's original reported case, maybe we should close this bug as fixed and use bug 1719745 or a new bug to track the remaining optimization here? That might make things a little easier to reason about & track, given that a substantial amount of time has passed since part 1 landed. Up to you, though.)

Flags: needinfo?(mats)

I've attached a patch in bug 1712413 which I think should fix the perf regression.

Leaving this open for now since we might want to backout/tweak the wallpaper we landed in this bug, depending on what we end up doing for reverse counters (i.e. should counter-increment: x 0 affect the start value of a reverse counter x, or not?).

Flags: needinfo?(mats)

The leave-open keyword is there and there is no activity for 6 months.
:emilio, maybe it's time to close this bug?
For more information, please visit auto_nag documentation.

Flags: needinfo?(emilio)
Flags: needinfo?(emilio)

The bug assignee is inactive on Bugzilla, so the assignee is being reset.

Assignee: MatsPalmgren_bugz → nobody
Status: ASSIGNED → NEW
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: