231925 - xml pretty printing is more bloated than necessary

Reporter

Description

•

21 years ago

I bet we could cut down on the additional bloat compared to IE as mentioned in bug 197956 if we get rid of the view-source html and instead use plain XML and CSS. We could get rid of all attributes, I bet we could get rid of the span for the '=' in attrs in favour of CSS :before {content: "=";}. Not sure if we should get rid of the table for the expander, too. It may suffice to just use CSS display and not expose the full table elements. Having a simpler markup might eventually ease the transition to a non-XSLT prettyprinter, as I don't think we can fix bug 175946 with it. I'll take a stab at this one.

Axel Hecht

Reporter

Updated

•

21 years ago

Blocks: 197956

Jonas Sicking (:sicking) No longer reading bugmail consistently

Comment 1

•

21 years ago

Comment 2

•

21 years ago

on a 600k XML file, I am like four times as fast. At least when I compare pretty-printing with a XSLT PIed version linking to my stylesheet. I removed all the html, so that we don't go thru the rather expensive html content generation. I removed almost all attributes and use just plain xml elements, that should get us a significant improvement in size. The generated tree is a good deal shallower (sp?). I removed the predicates from the tests in favour of xsl:choose, and added priorities so that we deal with texts and elements first, then comments, then PIs and then documents. The expander is done in xbl alone, so the call-template died, too. Of course, all of this is nothing until I manage to get collapsing undone. I need to put a testcase online to that I get some info from layout folks on why this thing is acting up. (Note that the collapsing and expanding is faster than the stuff IE does.) Jonas, do you notify an observer to each expandable element in the generated doc? If so, why did you do that? It seems like that is causing another 20% of the total time or so.

Axel Hecht

Reporter

Comment 3

•

21 years ago

some numbers, solaris 1.6 build (so no IE comparison, but you'll get the idea) TestGTKEmbed about:blank 25MB testfile with 2k nodes and 600k datasize (about one third of http://bugzilla.mozilla.org/show_bug.cgi?id=197956#c12). pretty-printing takes 8:30 mins and 92MB, xslt with my mods takes 2:40 mins and 69MBs

Axel Hecht

Reporter

Comment 4

•

21 years ago

pretty-printing makes it up to elem27, my version makes it up to elem98. though it's getting pretty confused when it one scrolls the other elements into the view. Dang, I get broken layout all over my ass.

Jonas Sicking (:sicking) No longer reading bugmail consistently

Comment 5

•

21 years ago

Would you be able to get any numbers on how the new stylesheet compares to the old one in a post optimized-xpath world? The reason i'm asking is that (at least for some testcases) optimized xpath makes the old stylesheet 6 times faster, whereas this new one is "only" 4 times faster and from your description doesn't seem to get as good benefit from optimized xpath.

Jonas Sicking (:sicking) No longer reading bugmail consistently

Comment 6

•

21 years ago

> Jonas, do you notify an observer to each expandable element in the generated > doc? If so, why did you do that? It seems like that is causing another > 20% of the total time or so. Not really sure what you mean, but as far as i remember i don't perform any notifications manually at all.

Jonas Sicking (:sicking) No longer reading bugmail consistently

Comment 7

•

21 years ago

The XSLT code does send out a lot of notifications during content-creation though, that's covered by bug 221335

Axel Hecht

Reporter

Comment 8

•

21 years ago

(In reply to comment #5) > Would you be able to get any numbers on how the new stylesheet compares to the > old one in a post optimized-xpath world? The reason i'm asking is that (at least > for some testcases) optimized xpath makes the old stylesheet 6 times faster, > whereas this new one is "only" 4 times faster and from your description doesn't > seem to get as good benefit from optimized xpath. As I mentioned in our favorite bug 197956, we shouldn't confuse factors for differently sized testcases. I just didn't take the time to wait for the full testcase, so I cut down it's size. That of course get's down the factor if one has different scalings in two attempts. I'll first try to get down to the odd facts in the layout before I start attaching testcases. Btw, IE has pretty odd layout problems with their stylesheet and deep documents, too. Hihihi.

Axel Hecht

Reporter

Comment 9

•

21 years ago

numbers on Nested_Chapter_Test.xml: pp: 72MB, 4:47 me: 54MB, 1:30 That's roughly a factor 3. http://bugzilla.mozilla.org/show_bug.cgi?id=208172#c3 says *6 for xpath optim, but I think that is from pre-walker days. Note that speedups do vary from architecture to architecture, too, as memory vs. cpu speed change as well as alloc overhead and such.

Axel Hecht

Reporter

Updated

•

21 years ago

Blocks: 232990

David Baron :dbaron:

Updated

•

21 years ago

No longer blocks: 232990

Depends on: 232990

Fini A. Alring

Comment 10

•

20 years ago

Is any work being done on XML Pretty print at the moment? I have studied the xsl pretty printer and found that it makes roughly eight times (8x) larger output, this gets very slow on larger XML files as they begin to hog large amounts of memory (I often work with files in the range of 500kb to 5000MB). The html output file could be miminalized easily, by using shorter class names and similar approaches. The result should be quicker response times (? I guess), and use of less memory. I am willing to assist if and where I am needed. ;)

Axel Hecht

Reporter

Comment 11

•

20 years ago

It's much more important to cut down the number of involved elements. Not sure if class names would have an equivalent impact. (I'm afraid that attribute values aren't stored as atoms for class="", though.) It'd be much more compact to store the information in element names instead of spans and classes, though.

Jonas Sicking (:sicking) No longer reading bugmail consistently

Comment 12

•

20 years ago

classnames are actually stored as atoms so changing the classname to something shorter will have no affect at all. However what could have a small effect as far as classes goes is to avoid having more then one class for a single element. In those cases we store the attribute as a list of atoms which takes much more space then a single atom. However it would probably have a much greater effect to cut down on the number of elements. Getting rid of the tabels would be great if it's possible. Though the best way to increase the speed of something comming up is bug 208172. The patch in there increases the speed of the prettyprint stylesheet by a factor of 6. I'm not marking that bug a blocker of this one though since this bug is about bloatyness and not speed, just figured i should mention it since comment 10 talked about slowness. Finally, if you're working with 5GB xml files mozilla won't be the tool to use. There's no way in hell we'll be able to render files like that prettyprinted in any sane sort of way.

Fini A. Alring

Comment 13

•

20 years ago

Hi, and thanks for the feedback. I am sure long classnames would have no big effect once in gecko dom layout, but the fact that it is xslt transformed into a much bigger document, which subsequently have to be parsed and rendered does seem a little bloated to me, now consider the xslt transformation was 8x times bigger than the source (I only tested one large example so far, but should give a rough factor), I'm guessing it could be streamlined a bit perhaps.

Fini A. Alring

Comment 14

•

20 years ago

Perhaps a version only featuring the folding part, omitting the color-coding would be beneficial for people working with larger XML documents (500kb+). Hopefully I will have some time to experiement more with this during the hollidays.

Fini A. Alring

Comment 15

•

20 years ago

(In reply to comment #12) > Finally, if you're working with 5GB xml files mozilla won't be the tool to use. > There's no way in hell we'll be able to render files like that prettyprinted in > any sane sort of way. Whoops, I meant 5MB not 5000MB, sorry for the mistake.

Axel Hecht

Reporter

Comment 16

•

20 years ago

(In reply to comment #13) We never serialize and parse the XSLT output, thus the string size doesn't matter, just the content size does.

Fini A. Alring

Comment 17

•

20 years ago

(In reply to comment #16) Im a little confused by that comment.. The XSLT output is the very html code that shows the pretty printed xml, thus it must be parsed - and it does make a difference is xslt output is 8MB or say 4MB since it would quite simple be less data to handle. I have worked a lot with XSLT, but not the inner workings of mozilla, are you catching the xslt output as a domtree and direct it directly to gecko?

Wladimir Palant

Comment 18

•

20 years ago

(In reply to comment #17) No, in Mozilla XSLT output is a DOM document, not its string representation. Serializing it into a string would be an additional step that isn't done here.

Fini A. Alring

Comment 19

•

20 years ago

(In reply to comment #18) ok, that makes good sense. I will return when I have made tests, my goal is to reduce the xsl output DOM object in size. Which to me still seems reasonable using existing technolgy and implementation - however I think on a longer timescale that a more low-level solution would greatly benefit in terms of speed and memory efficiency.

Boris Zbarsky [:bzbarsky]

Comment 20

•

18 years ago

So as of bug 379683 there are no more tables. Only divs and spans, some styled as inline-block (and some using the default block styling). All the uncolored plaintext is now just that -- plaintext. No spans around it. So if desired it should be easy to transition to XML instead of HTML if there's a win to it (as suggested by comment 2). Is there such a win?

Jonas Sicking (:sicking) No longer reading bugmail consistently

Comment 21

•

18 years ago

I think those benefits are gone these days since we no longer need to case-convert for LREs (i.e. for things like <foo> in the stylesheet). That is all done at compile time.

Axel Hecht

Reporter

Comment 22

•

18 years ago

Niceness :-) Anyway, I guess there could be still a win, in both content creation and CSS resolution. Things like <span class="foo"> could just be <foo>, as long as we have only one static class. That way, we don't have to create attributes for those elements, which should make us get rid of some allocations, and some modification events, too. I have no idea if it would help our style resolution if we don't have to access class attributes.

Jonas Sicking (:sicking) No longer reading bugmail consistently

Comment 23

•

17 years ago

Looking at the last profilein bug 197956, we spend about 25000 hits setting attributes, out of 190000 for the XSLT part over all. Given bug 197956 comment 56 last comment that the XSLT part is now around 50%, that means that we'd save some 5% by fixing this bug.

Boris Zbarsky [:bzbarsky]

Comment 24

•

17 years ago

The XSLT part is about 33%, not 50%, actually.

Boris Zbarsky [:bzbarsky]

Comment 25

•

16 years ago

For what it's worth, I tried converting this stuff to XML, and didn't see much of a speed or memory difference... Maybe I didn't do a very good job on the XML.

Phil Ringnalda (:philor)

Updated

•

15 years ago

QA Contact: ashshbhatt → xml

Axel Hecht

Reporter

Updated

•

8 years ago

Assignee: axel → nobody

BMO Automation

Updated

•

2 years ago

Severity: normal → S3