Closed Bug 920149 Opened 11 years ago Closed 11 years ago

certain strings appear to be duplicated many times in tbpl

Categories

(Tree Management Graveyard :: TBPL, defect)

defect
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: froydnj, Assigned: Swatinem)

References

Details

Attachments

(3 files, 2 obsolete files)

about:memory current says for TBPL that these strings are "notable" (i.e. taking up an outsized amount of memory): │ │ └───9.20 MB (00.81%) -- js-zone(0x7fd4481ab800) │ │ ├──5.65 MB (00.50%) ── unused-gc-things │ │ ├──3.20 MB (00.28%) -- strings │ │ │ ├──2.41 MB (00.21%) ++ normal │ │ │ ├──0.48 MB (00.04%) -- notable │ │ │ │ ├──0.04 MB (00.00%) ── string(length=49, copies=1186, "https://tbpl.mozilla.org/php/getParsedLog.php?id=")/gc-heap │ │ │ │ ├──0.04 MB (00.00%) ── string(length=50, copies=1186, "https://tbpl.mozilla.org/php/getLogExcerpt.php?id=")/gc-heap │ │ │ │ ├──0.03 MB (00.00%) ── string(length=7, copies=1128, "success")/gc-heap │ │ │ │ ├──0.02 MB (00.00%) ++ string(length=82, copies=123, "buildbot-master67.srv.releng.use1.mozilla.com:/builds/buildbot/tests1-linux/master") I've included the "buildbot-..." string so you can see how many more copies of the URL and "success" there are. And this is for a close-to-freshly-loaded page; leaving tbpl open for several hours or even overnight, it wouldn't be unusual to have the URLs and "success" taking up half a megabyte or more. This state of affairs seems unusual.
Nick, are there any tools that can help determine who's holding on to the copies of all those strings?
Flags: needinfo?(n.nethercote)
Get a JS heap dump (the easiest way to do that is to get a CC dump, and it will be produced at the same time: https://wiki.mozilla.org/Performance:Leak_Tools#Cycle_collector_heap_dump ). Then you can use my GC log analysis scripts at https://github.com/amccreight/heapgraph/tree/master/g stringy.py should give you a reasonable overview of what strings there are, then you can use --string-mode with find_roots.py to see what is holding it alive. It is a bit rough, but I've used it to successfully track down the cause of some string bloat in B2G.
Thats very interesting. Basically most of tbpl is just string concat + innerHTML. Ideally nothing from script side should hold onto them, they should be embedded into the DOM as href, title and other kind of attributes or textnodes. I’ve read a blog some time ago about string handling in spidermonkey, maybe someone can enlighten me further. As these are just string fragments of stuff we concatenate, I guess they must be parts of rope strings. I just don’t know why they are duplicated. If they are indeed ropes, they should be shared somehow? So what kind of best practices do you have to mitigate this problem? I mean its really a common problem, a *lot* of templating engines use string concat (or Array.join) internally.
I don't have anything notable to add, other than the amount taken up doesn't seem that high to me.
(In reply to Andrew McCreight [:mccr8] from comment #2) > Get a JS heap dump (the easiest way to do that is to get a CC dump, and it > will be produced at the same time: > https://wiki.mozilla.org/Performance:Leak_Tools#Cycle_collector_heap_dump ). > Then you can use my GC log analysis scripts at > https://github.com/amccreight/heapgraph/tree/master/g stringy.py should > give you a reasonable overview of what strings there are, then you can use > --string-mode with find_roots.py to see what is holding it alive. It is a > bit rough, but I've used it to successfully track down the cause of some > string bloat in B2G. What's the second argument to find_roots.py? The address of the string from the gc log?
Flags: needinfo?(n.nethercote) → needinfo?(continuation)
Looks like with --string-mode it is the string you want to look for.
Flags: needinfo?(continuation)
(In reply to Arpad Borsos (Swatinem) from comment #3) > Thats very interesting. Basically most of tbpl is just string concat + > innerHTML. > > Ideally nothing from script side should hold onto them, they should be > embedded into the DOM as href, title and other kind of attributes or > textnodes. My guess is that the strings are actually generated hereabouts: https://hg.mozilla.org/webtools/tbpl/file/ecbd85696df3/js/BuildbotDBUser.js#l61 If the JS engine is smart enough to concatenate constant strings in there, it ought to be smart enough to share the strings, too. The GC log never actually displays those strings, though, it just displays '<rope: length 49>' or '<rope: length 50>'. And the ropes are definitely shared. I'm wondering if about:memory is not somehow miscounting those strings. Nick?
Flags: needinfo?(n.nethercote)
Thanks for the info. So ye, we do save a lot of useless strings on objects. Would it make sense to replace those by getters that output the string dynamically instead of predefining it?
(In reply to Arpad Borsos (Swatinem) from comment #8) > Thanks for the info. So ye, we do save a lot of useless strings on objects. > Would it make sense to replace those by getters that output the string > dynamically instead of predefining it? I think that's worth a shot, yes. AFAICS, you wouldn't normally need most of the strings anyway on a given tbpl page (you only want to look at a couple of jobs at a time), so I think you'd do less work by generating the strings on demand. Not sure that I even have the right place; looks plausible, though.
The string measurement is accurate, AFAIK.
Flags: needinfo?(n.nethercote)
Attached patch Move properties to accessors (deleted) — Splinter Review
The patch moves the predefined properties into dynamic getters (they are completely transparent)
Assignee: nobody → arpad.borsos
Status: NEW → ASSIGNED
Attachment #816218 - Flags: review?(emorley)
Attached file about:memory measurement (deleted) —
So here is a verbose about:memory measurement. STR: * load the inbound tree * click the down (load more) arrow * minimize memory * take measurement most notable changes: 43,330,864 B (07.69%) -- top(https://tbpl.mozilla.org/?tree=Mozilla-Inbound, id=97) vs 31,009,736 B (05.50%) -- top(file:///home/swatinem/Coding/tinderboxpushlog/index.html?tree=Mozilla-Inbound, id=95) 11,507,504 B (02.04%) -- strings 757,184 B (00.13%) -- notable vs 4,079,856 B (00.72%) -- strings 222,560 B (00.04%) -- notable Maybe someone with a lot heavier tbpl usage than me can prove these improvements?
Another question to the js engine pros: The first measurement shows the whole UserInterface.js code as a notable string. Why? And why not the even bigger jquery.js?
Attached patch slimfast.patch (obsolete) (deleted) — Splinter Review
MachineResult.runID was casted to string needlessly, also .tree is always the same. Also defined .push and .order to keep with the best practice to "define all properties in the constructor and dont change shapes of objects on runtime". I did the opposite to Machine.jobPartNumber: some memoization, since I saw that function a lot in profiles. There are not a lot of these structures around, so it should not matter for mem usage. The "success" and "testfailed" string is actually saved literally in the DB and comes in via JSON.
Attachment #816330 - Flags: review?(emorley)
With those two patches, viewing mozilla-central with 20 pushes (not exactly a heavy load, I typically have 6+ TBPL tabs open with 10-30 pushes shown in each)... Before: │ ├───49.77 MB (10.29%) -- top(https://tbpl.mozilla.org/, id=364) │ │ ├──28.70 MB (05.93%) -- active/window(https://tbpl.mozilla.org/) │ │ │ ├──11.68 MB (02.41%) -- dom │ │ │ │ ├───9.74 MB (02.01%) ── element-nodes │ │ │ │ └───1.94 MB (00.40%) ++ (4 tiny) │ │ │ ├───9.47 MB (01.96%) -- layout │ │ │ │ ├──5.23 MB (01.08%) ++ frames │ │ │ │ └──4.24 MB (00.88%) ++ (6 tiny) │ │ │ ├───7.48 MB (01.55%) -- js-compartment(https://tbpl.mozilla.org/) │ │ │ │ ├──6.51 MB (01.35%) -- objects │ │ │ │ │ ├──4.84 MB (01.00%) ++ gc-heap │ │ │ │ │ └──1.67 MB (00.34%) ++ (2 tiny) │ │ │ │ └──0.97 MB (00.20%) ++ (5 tiny) │ │ │ └───0.07 MB (00.02%) ++ (2 tiny) │ │ ├──20.52 MB (04.24%) -- js-zone(0x3045f000) │ │ │ ├──14.04 MB (02.90%) -- strings │ │ │ │ ├──12.93 MB (02.67%) -- normal │ │ │ │ │ ├───9.87 MB (02.04%) ── malloc-heap │ │ │ │ │ └───3.06 MB (00.63%) ── gc-heap │ │ │ │ └───1.11 MB (00.23%) ++ (2 tiny) │ │ │ ├───6.25 MB (01.29%) ── unused-gc-things │ │ │ └───0.23 MB (00.05%) ++ (5 tiny) │ │ └───0.55 MB (00.11%) ++ cached/window(about:home) With patch: │ ├───37.21 MB (07.69%) -- top(file:///C:/src/tbpl/index.html, id=354) │ │ ├──26.95 MB (05.57%) -- active/window(file:///C:/src/tbpl/index.html) │ │ │ ├──12.30 MB (02.54%) -- dom │ │ │ │ ├──10.36 MB (02.14%) ── element-nodes │ │ │ │ └───1.94 MB (00.40%) ++ (4 tiny) │ │ │ ├───7.86 MB (01.63%) -- layout │ │ │ │ ├──5.23 MB (01.08%) ++ frames │ │ │ │ └──2.63 MB (00.54%) ++ (6 tiny) │ │ │ ├───6.71 MB (01.39%) -- js-compartment(file:///C:/src/tbpl/index.html) │ │ │ │ ├──5.79 MB (01.20%) ++ objects │ │ │ │ └──0.92 MB (00.19%) ++ (5 tiny) │ │ │ └───0.07 MB (00.02%) ++ (2 tiny) │ │ ├───9.74 MB (02.01%) -- js-zone(0xbe29400) │ │ │ ├──5.66 MB (01.17%) -- strings │ │ │ │ ├──4.88 MB (01.01%) ++ normal │ │ │ │ └──0.77 MB (00.16%) ++ (2 tiny) │ │ │ └──4.09 MB (00.84%) ++ (6 tiny) │ │ └───0.53 MB (00.11%) ++ cached/window(about:home) -> 24% reduction... not to be sniffed at! Awesome :-D
Attachment #816218 - Flags: review?(emorley) → review+
Comment on attachment 816330 [details] [diff] [review] slimfast.patch Review of attachment 816330 [details] [diff] [review]: ----------------------------------------------------------------- This is great - thank you! :-)
Attachment #816330 - Flags: review?(emorley) → review+
Attached patch Other cleanups (obsolete) (deleted) — Splinter Review
(In reply to Arpad Borsos (Swatinem) from comment #14) > MachineResult.runID was casted to string needlessly, also .tree is always > the same. Hm, id has type int in the database, but comes out as string in the json. So I cast it back to a number. Also I got rid of those notable "success" strings by a clever hack. Maybe it is getting a bit ridiculous by now :-D
Attachment #816330 - Attachment is obsolete: true
Attachment #816335 - Flags: review?(emorley)
Just as a side note: The comparison is a little bit in favor of the in-production version, since it does not have the split to b2g desktop yet, with all the additional dom nodes and associated overhead. But anyway, there is still a clear win :-)
Comment on attachment 816335 [details] [diff] [review] Other cleanups Review of attachment 816335 [details] [diff] [review]: ----------------------------------------------------------------- r- due to TypeError breaking job selection ::: js/BuildbotDBUser.js @@ +45,5 @@ > + // a constant string. Saves us ~200k for ~5k results. \o/ > + _staticStrings: { > + "success": "success", > + "testfailed": "testfailed", > + }, Please can you also add the other result states from: https://hg.mozilla.org/webtools/tbpl/file/4b9bf20dad6a/dataimport/import-buildbot-data.py#l83 @@ +61,3 @@ > "machine": machine, > "slave": run.slave, > + "runID": parseInt(run._id, 10), Unfortunately this causes a TypeError, since the runID can helpfully not actually be a integer - in the case of pending and running jobs, which are prefixed with "pending-" and "running-" appropriately, so we trip up against: https://hg.mozilla.org/webtools/tbpl/file/4b9bf20dad6a/js/UserInterface.js#l1435 ...when selecting a job in the main UI.
Attachment #816335 - Flags: review?(emorley) → review-
Attached patch Other cleanups (deleted) — Splinter Review
(In reply to Ed Morley [:edmorley UTC+1] from comment #19) > Please can you also add the other result states from: > https://hg.mozilla.org/webtools/tbpl/file/4b9bf20dad6a/dataimport/import- > buildbot-data.py#l83 Done > > "machine": machine, > > "slave": run.slave, > > + "runID": parseInt(run._id, 10), > > Unfortunately this causes a TypeError, since the runID can helpfully not > actually be a integer - in the case of pending and running jobs, which are > prefixed with "pending-" and "running-" appropriately, so we trip up against: > https://hg.mozilla.org/webtools/tbpl/file/4b9bf20dad6a/js/UserInterface. > js#l1435 > ...when selecting a job in the main UI. Oh, so the string cast was actually useful. I added it back as a .toString() so we don’t break when php suddenly decides to json_encode it as a real int.
Attachment #816335 - Attachment is obsolete: true
Attachment #816345 - Flags: review?(emorley)
Comment on attachment 816345 [details] [diff] [review] Other cleanups Review of attachment 816345 [details] [diff] [review]: ----------------------------------------------------------------- Looks good, thank you :-)
Attachment #816345 - Flags: review?(emorley) → review+
Depends on: 931759
In production :-)
Status: ASSIGNED → RESOLVED
Closed: 11 years ago
Resolution: --- → FIXED
Product: Webtools → Tree Management
Product: Tree Management → Tree Management Graveyard
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: