Open
Bug 1397380
Opened 7 years ago
Updated 2 years ago
stylo: Investigate sources of stylo memory overhead on the html single-page spec
Categories
(Core :: CSS Parsing and Computation, defect, P3)
Tracking
()
NEW
Tracking | Status | |
---|---|---|
firefox57 | --- | wontfix |
People
(Reporter: bzbarsky, Unassigned)
References
(Depends on 1 open bug, )
Details
Attachments
(4 files, 13 obsolete files)
Spun off from bug 1384945 for the non-adblock investigation.
I'll be posting some memory reports and analysis.
Updated•7 years ago
|
Summary: Investigate sources of stylo memory overhead on the html single-page spec → stylo: Investigate sources of stylo memory overhead on the html single-page spec
Reporter | ||
Comment 1•7 years ago
|
||
Reporter | ||
Comment 2•7 years ago
|
||
Reporter | ||
Comment 3•7 years ago
|
||
Reporter | ||
Comment 4•7 years ago
|
||
Reporter | ||
Comment 5•7 years ago
|
||
Reporter | ||
Comment 6•7 years ago
|
||
OK, so some notes. The baseline no-stylo explicit allocation number here is ~270MB for the web content process. All measurements are done on an 8-core (+ hyperthreading, so 16 logical cores) Linux box. The STYLO_THREADS environment variable was used to force particular stylo threading behaviors.
Going from non-stylo to stylo-1-thread seems to have the following changes:
-4MB style contexts.
-3MB Gecko style structs
+12MB ComputedValues
+16MB servo style structs
+8MB heap-unclassified
+2MB heap-overhead, mostly bin-unused and bookkeeping.
Going from 1 thread to 3 threads for stylo gives us:
+10MB ComputedValues
+10MB servo style structs
+2MB heap-unclassified
+6MB heap-overhead, split 50-50 between bin-unused and page-cache.
Going from 3 threads to 6 threads for stylo gives us:
+2.5MB ComputedValues
+1.2MB servo style structs
no real changes to heap-overhead or heap-unclassified.
Going from 6 to 12 threads for stylo gives us:
+3MB ComputedValues
+3MB servo style structs
+7MB heap-overhead
no real changes to heap-unclassified
I suspect the heap-overhead bits for 6 threads being exactly like 3 and then 12 being a lot more may somewhat be noise.
I will look into the heap-unclassified here next.
Comment 7•7 years ago
|
||
(P3 just to avoid having multiple blocking bugs tracking memory usage)
Priority: -- → P3
Reporter | ||
Comment 8•7 years ago
|
||
Oh, and notably all the above numbers are on 64-bit; I don't have an easily accessible 32-bit setup right now.
I tried doing a DMD run with STYLO_THREADS=3 to look at the heap-unclassified there. The following bits are popping up as stylo-related:
1) 160185 32-byte allocations for ElementData. That's a bit over 5MB, or about
half the increase in heap-unclassified over Gecko if my numbers above match up
with this number. This page has 154180 elements on it, and some of those have
::before/::after, so this number of ElementData instances looks quite plausible.
I filed bug 1397472 on adding a memory reporter here.
2) 10322 128-byte, 260 112-byte, 5876 80-byte, 63 64-byte, 99 32-byte allocations of
nsTArray buffers under Gecko_FontFamilyList_AppendNamed and nsFont constructors.
This adds up to about 1.8MB with servo, but only 340KB with Gecko, presumably
because we have more Font structs with Servo. This is an array of FontFamilyName structs.
Each such struct has a uint32_t type, then an nsString. nsString is a char_type*,
then uint32_t, then uint16_t, then uint16_t. So nsString is 16 bytes on 64-bit,
but alignment presumably means FontFamilyName is 32 bytes (vs 16 bytes on 32-bit).
Plus we have the 8 bytes of length/capacity arrays store. So a 112-byte allocation
should hold 3 family names, while a 128 byte one can't hold 4... I dunno why we
have 128-byte allocations here, unless we ended up doubling from a smaller size.
Anyway, maybe we can shrink FontFamilyName, maybe we can do more sharing or something.
3) 2828 272-byte allocations of ComputedValues, created from Servo_ComputedValues_Inherit,
called from ResolveStyleForText. This is about 800KB. I'm not sure where this ComputedValues
is hanging out such that we don't report it. Maybe cached on its parent ComputedValues? We
certainly _are_ reporting, via nsIFrame::AddSizeOfExcludingThisForTree, many megabytes of other
ComputedValues we create for text... Might be good to figure out how to memory-report this.
4) 95 272-byte allocations of ComputedValues created from nsButtonFrameRenderer::ReResolveStyles
calling ProbePseudoElementStyle. I guess this is the mInnerFocusStyle of the button frame.
This is only 26KB, so not a huge deal, but we should still add a memory reporter here.
5) 9 ComputedValues allocations that came from Servo_TraverseSubtree computing style for an element.
I have no idea why these were not reported, and that worries me a bit.
6) Various allocations under Servo_StyleSheet_FromUTF8Bytes. Might be about 500KB in there;
it's a bit hard to tell because they're so scattered all over. I had thought we had
memory reporters for the servo stylesheets...
7) Array allocations from Gecko_ClearAndResizeStyleContents, allocating arrays of nsStyleContentData.
Looks like about 110KB of this. In Gecko it's not in the top 1000 heap block records.
8) 12 8192-byte blocks allocated in SHARING_CACHE_KEY::__init. Why 12? I'm pretty sure I did
STYLO_THREADS=3. 12 is the normal number we'd use; are we forgetting to consider
STYLO_THREADS somewhere? Or is this just coincidence? I also have 12 BLOOM_KEY::__init,
9) A few hundred KB under EagerPseudoStyles::set.
The stacks DMD produces for me have useful symbols, but claim "crtstuff.c:?" for the file/line in the Gecko code, which makes it a bit hard to tell where some allocations are happening. :(
Anyway, the upshot is that we have lots of ElementData we're not reporting, we have a bunch more FontFamilyName array stuff due to worse struct/style sharing, we have fair amounts of ComputedValues (esp for text) that we're not reporting yet. I think those items between them cover at least 3/4 of the heap-unclassified regression.
Reporter | ||
Comment 9•7 years ago
|
||
Reporter | ||
Comment 10•7 years ago
|
||
Comment 11•7 years ago
|
||
(In reply to Boris Zbarsky [:bz] (still digging out from vacation mail) from comment #8)
> Anyway, maybe we can shrink FontFamilyName, maybe we can do more sharing or something.
We can probably turn font family names into atom so that we avoid that allocation.
Comment 12•7 years ago
|
||
(In reply to Xidorn Quan [:xidorn] UTC+10 from comment #11)
> (In reply to Boris Zbarsky [:bz] (still digging out from vacation mail) from
> comment #8)
> > Anyway, maybe we can shrink FontFamilyName, maybe we can do more sharing or something.
>
> We can probably turn font family names into atom so that we avoid that
> allocation.
Oh, the array buffer under Gecko_FontFamilyList_AppendNamed would not be affected by using atom, so yes this is really just from lack of style struct sharing.
Reporter | ||
Comment 13•7 years ago
|
||
> the array buffer under Gecko_FontFamilyList_AppendNamed would not be affected by using atom
Well, it would, in that an nsIAtom* is smaller than an nsString....
Comment 14•7 years ago
|
||
Would it make sense to make FontFamilyList a refcounted object, and have that object created (and stored) in the specified value of font-family?
Reporter | ||
Comment 15•7 years ago
|
||
Reporter | ||
Comment 16•7 years ago
|
||
Reporter | ||
Comment 17•7 years ago
|
||
Reporter | ||
Comment 18•7 years ago
|
||
So the patch in bug 1367635 helps a good bit with the style struct numbers. They're still not down to Gecko level, but a lot closer than without. With that patch, we get behavior like this (to be compared to comment 6):
Going from non-stylo to stylo-1-thread:
+6MB servo style structs
Going from 1 thread to 3 threads for stylo gives us:
+2.4MB servo style structs
Going from 3 threads to 6 threads for stylo gives us:
+2.25MB servo style structs
Reporter | ||
Comment 19•7 years ago
|
||
Reporter | ||
Comment 20•7 years ago
|
||
Reporter | ||
Comment 21•7 years ago
|
||
Reporter | ||
Comment 22•7 years ago
|
||
> Would it make sense to make FontFamilyList a refcounted object
Possibly, yes.
Comment 23•7 years ago
|
||
> but alignment presumably means FontFamilyName is 32 bytes (vs 16 bytes on 32-bit).
This doesn't sound correct. FontFamilyName should be 24 bytes on 64bit, and 16 bytes on 32bit.
So 128 bytes can hold 5, 112 bytes can hold 4, 80 bytes can hold 3, 64 bytes can hold 2, and 32 bytes can hold 1.
If we change FontFamilyName to store an atom, it would become 16 bytes (vs 8 bytes on 32bit), and we would have 96 bytes for 5, 80 bytes for 4, 64 bytes for 3, 48 bytes for 2, and 32 bytes for 1. That sum up to ~1.3MB, which doesn't seem to be a big win.
Also note that, 16,620 nsStyleFont is 2MB itself... so probably finding a way to share this struct is still worthy?
Comment 24•7 years ago
|
||
(In reply to Xidorn Quan [:xidorn] UTC+10 from comment #23)
> > but alignment presumably means FontFamilyName is 32 bytes (vs 16 bytes on 32-bit).
>
> This doesn't sound correct. FontFamilyName should be 24 bytes on 64bit, and
> 16 bytes on 32bit.
malloc_usable_size(malloc(24)) is 32.
Comment 25•7 years ago
|
||
(In reply to Mike Hommey [:glandium] from comment #24)
> malloc_usable_size(malloc(24)) is 32.
FontFamilyName is stored in an array, I don't think malloc_usable_size applies here.
Reporter | ||
Comment 26•7 years ago
|
||
> FontFamilyName should be 24 bytes on 64bit
Good catch.
> That sum up to ~1.3MB, which doesn't seem to be a big win.
"big" is kinda relative. We have a 60MB regression, so 1.3MB is not a huge chunk of it, but it's not nothing either.
I agree that sharing the Font struct would be good too, and anything we do there will reduce the win we get from making the FontFamilyName storage smaller.
Anyway, I went through the list in comment 8 and filed bug 1397813 on item 3, bug 1397815 on item 4. As for item 8, I tried again with STYLO_THREADS=3 and can't reproduce. So I must have somehow failed to pass STYLO_THREADS to mach run correctly when I got the numbers above...
Reporter | ||
Comment 27•7 years ago
|
||
Attachment #8905180 -
Attachment is obsolete: true
Attachment #8905181 -
Attachment is obsolete: true
Attachment #8905182 -
Attachment is obsolete: true
Attachment #8905184 -
Attachment is obsolete: true
Attachment #8905185 -
Attachment is obsolete: true
Attachment #8905312 -
Attachment is obsolete: true
Attachment #8905313 -
Attachment is obsolete: true
Attachment #8905316 -
Attachment is obsolete: true
Attachment #8905317 -
Attachment is obsolete: true
Attachment #8905318 -
Attachment is obsolete: true
Attachment #8905325 -
Attachment is obsolete: true
Attachment #8905326 -
Attachment is obsolete: true
Attachment #8905327 -
Attachment is obsolete: true
Reporter | ||
Comment 28•7 years ago
|
||
Compared to no-stylo this shows:
+16MB ComputedValues
+11MB servo style structs
+5MB servo element data
-4MB style contexts
-3MB gecko style structs
-2MB "pres-shell" (not sure where this is coming from)
+2MB heap-unclassified
+1MB heap-overhead
Compare this to comment 6, where we had +22MB ComputedValues and +26MB servo style structs, +10MB heap-unclassified, +8MB heap-overhead. The total regression back then was ~57MB; now it's ~26MB.
Reporter | ||
Comment 29•7 years ago
|
||
This shows:
+4.5MB ComputedValues
+2MB servo-style-structs
+0.5MB heap-unclassified
+0.5MB heap-overhead
when compared against 3 threads. Again, compare comment 6, where we used to only add 3.7MB here. This makes sense, since we added more various sharing stuff, but it's all thread-local...
Reporter | ||
Comment 30•7 years ago
|
||
Here we have a 15MB regression over no-stylo. ComputedValues is ~9MB and servo-stye-structs ~8MB.
Updated•7 years ago
|
status-firefox57:
--- → wontfix
Updated•2 years ago
|
Severity: normal → S3
You need to log in
before you can comment on or make changes to this bug.
Description
•