Closed
Bug 627220
Opened 14 years ago
Closed 7 years ago
Finalize external strings in the background
Categories
(Core :: JavaScript: GC, defect)
Tracking
()
Tracking | Status | |
---|---|---|
firefox56 | --- | fixed |
People
(Reporter: gal, Assigned: jandem, NeedInfo)
References
(Blocks 1 open bug)
Details
Attachments
(2 files)
(deleted),
patch
|
sfink
:
review+
|
Details | Diff | Splinter Review |
(deleted),
patch
|
sfink
:
review+
|
Details | Diff | Splinter Review |
External strings are currently finalized via an API callback. This is lame, at least for our current embedding, because all we end up doing in the callback is nsMemory::Free which does NS_Free which does NS_Free_P which does free. Our default engine-internal treatment is to call free as well, but we do so from a background thread, which is quite a bit faster.
Reporter | ||
Comment 1•14 years ago
|
||
External string allocation profile for a mix of cnn.com, facebook.com and slashdorg.org: 647 0 1091 1 467 2 1034 3 2093 4 660 5 576 6 1379 7 853 8 785 9 676 10 1676 11 818 12 592 13 453 14 852 15 858 16 327 17 186 18 399 19 174 20 168 21 157 22 179 23 125 24 109 25 42 26 28 27 193 28 96 29 74 30 98 31 114 32 62 33 243 34 42 35 53 36 221 37 54 38 52 39 59 40 51 41 45 42 120 43 147 44 51 45 56 46 293 47 34 48 35 49 55 50 161 51 103 52 82 53 62 54 44 55 215 56 82 57 56 58 83 59 39 60 57 61 55 62 50 63 30 64 35 65 106 66 33 67 104 68 54 69 96 70 68 71 95 72 183 73 56 74 105 75 37 76 75 77 136 78 52 79 136 80 63 81 100 82 49 83 82 84 63 85 71 86 69 87 67 88 85 89 31 90 97 91 34 92 66 93 30 94 38 95 20 96 22 97 35 98 16 99 10 100 35 101 39 102 54 103 7 104 16 105 20 106 6 107 16 108 3 109 10 110 6 111 11 112 14 113 21 114 3 115 6 116 14 117 6 118 5 119 13 120 4 121 3 122 4 123 4 124 7 125 13 126 7 127 3 128 6 129 1 130 2 131 4 132 6 133 3 134 1 135 11 136 1 140 3 141 6 142 12 144 5 145 1 147 2 148 3 152 1 153 3 154 1 155 3 156 3 157 2 158 1 159 2 160 2 161 1 162 1 163 6 166 19 168 1 169 4 170 14 171 2 172 1 174 18 175 8 182 1 184 6 185 9 187 1 188 2 191 6 192 5 193 1 194 1 196 5 197 3 198 2 199 1 200 2 201 22 202 15 204 2 208 1 210 10 214 2 218 2 221 1 222 1 224 6 225 22 226 1 230 1 232 27 234 1 236 1 238 10 240 24 242 1 245 1 246 1 247 1 248 12 249 1 250 1 251 10 252 2 254 1 259 1 264 1 266 4 274 10 276 7 282 8 285 1 298 1 307 6 332 6 360 28 427 10 439 1 445 1 452 2 457 11 465 17 491 35 504 24 573 2 581 1 663 4 666 2 699 1 707 2 777 12 827 1 832 1 840 2 843 2 857 2 872 2 876 2 985 2 1138 2 1140 1 1462 2 1519 1 1523 2 3068 4 3413 2 3732 2 4595 1 6690 1 7238 2 16384 1 138318 6 851954 2 990121 There is a few really huge strings here, and quite a few really short ones that we could use short strings or internal strings for, but no clear simple wins.
Reporter | ||
Comment 2•14 years ago
|
||
(note that this isn't as simple for DOM strings, those to nsStringBuffer::FromData(chars)->Release())
Comment 3•14 years ago
|
||
That Release call should, in fact, be the common case of external strings, no? Unless the callee had an autostring....
Reporter | ||
Comment 4•14 years ago
|
||
Yeah (the data above is for both cases, it was taken in JS_NewExternalString). Both finalizers can run from a background thread though, the ref counting of the string is atomic. anygregor is working on a patch for that.
Comment 5•14 years ago
|
||
> the ref counting of the string is atomic
Yes, though note that we'd like it to stop being so...
Reporter | ||
Comment 6•14 years ago
|
||
First of all: yeah, do that. The atomic op is crazy expensive. I don't think it would hurt us here. These strings are relatively infrequent. I can ship them back in a runnable to the main thread and ref-count them down there.
Comment 7•14 years ago
|
||
Doing that is bug 585661, fwiw.
Updated•10 years ago
|
Assignee: general → nobody
Updated•7 years ago
|
Component: JavaScript Engine → JavaScript: GC
Summary: improve external string finalization → Finalize external strings in the background
Updated•7 years ago
|
Whiteboard: [qf]
Assignee | ||
Updated•7 years ago
|
Flags: needinfo?(jdemooij)
Assignee | ||
Comment 8•7 years ago
|
||
Let's try Linux64 first: https://treeherder.mozilla.org/#/jobs?repo=try&revision=000eb24bbf30b460a8c361696adb4b92404c5199
Assignee | ||
Comment 9•7 years ago
|
||
Note that this would have been harder before bug 1353758. We used to clear Gecko's string cache in the finalizer but we no longer do that. This also means we can remove the Zone* argument from the finalizer as nobody uses it.
Assignee | ||
Comment 10•7 years ago
|
||
Try looks green. I'll be traveling tomorrow, will post patches for review soon.
Blocks: 1351769
Updated•7 years ago
|
Whiteboard: [qf] → [qf:p1]
Assignee | ||
Comment 11•7 years ago
|
||
None of the finalizers use the Zone* argument and it's best not to expose it once we may be called off thread.
Assignee | ||
Comment 12•7 years ago
|
||
Flags: needinfo?(jdemooij)
Attachment #8881092 -
Flags: review?(sphink)
Updated•7 years ago
|
Attachment #8881091 -
Flags: review?(sphink) → review+
Comment 13•7 years ago
|
||
Comment on attachment 8881092 [details] [diff] [review] Part 2 - Finalize external strings in the background Review of attachment 8881092 [details] [diff] [review]: ----------------------------------------------------------------- If that passes tests, this is certainly about as straightforward as you can get.
Attachment #8881092 -
Flags: review?(sphink) → review+
Comment 14•7 years ago
|
||
Pushed by jandemooij@gmail.com: https://hg.mozilla.org/integration/mozilla-inbound/rev/efd7a75d3b34 part 1 - Remove unused Zone* argument from external string finalizers. r=sfink https://hg.mozilla.org/integration/mozilla-inbound/rev/d4bc665ad39b part 2 - Finalize external strings in the background. r=sfink
Comment 15•7 years ago
|
||
bugherder |
https://hg.mozilla.org/mozilla-central/rev/efd7a75d3b34 https://hg.mozilla.org/mozilla-central/rev/d4bc665ad39b
Status: ASSIGNED → RESOLVED
Closed: 7 years ago
status-firefox56:
--- → fixed
Resolution: --- → FIXED
Target Milestone: --- → mozilla56
Comment 16•7 years ago
|
||
Henri, did this actually help your encoding/decoding benchmark? I was seeing some weirdness where profiler-reported mainthread time dropped, but wall-clock measurements did not..
Flags: needinfo?(hsivonen)
(In reply to Boris Zbarsky [:bz] from comment #16) > Henri, did this actually help your encoding/decoding benchmark? I was > seeing some weirdness where profiler-reported mainthread time dropped, but > wall-clock measurements did not.. Some wall clock times become better (Decode js) but others became worse by quite a bit (Decode en, Decode de, Decode fr). ## Before What Language Encoding MiB/s Mi u16s / s ns / iteration Decode ar utf-8 256 217 1173902 Decode ar windows-1256 258 258 1008361 Decode cs utf-8 323 315 984831 Decode cs windows-1250 339 339 934001 Decode de utf-8 400 395 759988 Decode de windows-1252 419 419 732955 Decode el utf-8 167 146 1388130 Decode el windows-1253 165 165 1267433 Decode en utf-8 409 408 1646406 Decode en windows-1252 427 427 1587302 Decode fr utf-8 469 463 1911695 Decode fr windows-1252 468 468 1913543 Decode he utf-8 242 207 1036547 Decode he windows-1255 159 159 1392141 Decode ja utf-8 177 147 1335255 Decode ja shift_jis 142 130 1538963 Decode ko utf-8 138 119 1138206 Decode ko euc-kr 129 120 1167464 Decode pt utf-8 371 367 1524078 Decode pt windows-1252 370 370 1529838 Decode ru utf-8 265 227 2257596 Decode ru windows-1251 275 275 1882883 Decode th utf-8 377 308 2467706 Decode th windows-874 391 391 1961111 Decode tr utf-8 295 286 947936 Decode tr windows-1254 317 317 872234 Decode vi utf-8 272 257 2040392 Decode vi windows-1258 337 337 1755556 Decode zh_cn utf-8 312 274 953178 Decode zh_cn gb18030 242 225 1158397 Decode zh_tw utf-8 310 271 963019 Decode zh_tw big5 271 253 1045893 Decode ja euc-jp 150 137 1456071 Decode ja iso-2022-jp 79.43 67.29 2963345 Decode bin x-user-defined 174 174 73810 Decode js utf-8 188 188 440061 Decode js big5 190 190 434660 Decode js euc-kr 182 182 455065 Decode js gb18030 191 191 432953 Decode js gbk 188 188 439677 Decode js shift_jis 192 192 431096 Decode js euc-jp 188 188 440894 Decode js iso-2022-jp 77.66 77.66 1064802 Decode js windows-1250 192 192 430904 Decode js windows-1251 178 178 465639 Decode js windows-1252 189 189 436603 Decode js windows-1253 188 188 439229 Decode js windows-1254 185 185 446721 Decode js windows-1255 185 185 446721 Decode js windows-1256 186 186 445423 Decode js windows-1257 185 185 445833 Decode js windows-1258 181 181 456131 Decode js windows-874 192 192 430392 Decode js☃ utf-8 187 187 442646 Encode ar utf-8 134 74.82 164431 Encode cs utf-8 133 121 347444 Encode de utf-8 246 242 298258 Encode el utf-8 133 73.96 232621 Encode en utf-8 274 274 224858 Encode fr utf-8 169 163 805228 Encode he utf-8 129 74.16 365266 Encode ja utf-8 190 68.02 163995 Encode ko utf-8 138 59.75 138347 Encode pt utf-8 205 199 316752 Encode ru utf-8 141 78.94 789866 Encode th utf-8 191 68.41 925751 Encode tr utf-8 135 122 395010 Encode vi utf-8 114 87.60 725922 Encode zh_cn utf-8 188 69.64 156411 Encode zh_tw utf-8 175 64.83 168321 Encode js utf-8 314 314 263765 ## After What Language Encoding MiB/s Mi u16s / s ns / iteration Decode ar utf-8 171 145 1750520 Decode ar windows-1256 204 204 1277857 Decode cs utf-8 237 231 1340804 Decode cs windows-1250 245 245 1293655 Decode de utf-8 260 257 1171449 Decode de windows-1252 275 275 1116803 Decode el utf-8 203 178 1138713 Decode el windows-1253 205 205 1020184 Decode en utf-8 302 301 2233333 Decode en windows-1252 299 299 2266686 Decode fr utf-8 303 299 2960871 Decode fr windows-1252 305 305 2931200 Decode he utf-8 189 162 1329025 Decode he windows-1255 198 198 1115323 Decode ja utf-8 214 177 1103170 Decode ja shift_jis 172 157 1272135 Decode ko utf-8 154 133 1019358 Decode ko euc-kr 155 144 973668 Decode pt utf-8 270 267 2093879 Decode pt windows-1252 278 278 2040381 Decode ru utf-8 213 182 2818242 Decode ru windows-1251 217 217 2384516 Decode th utf-8 275 224 3387314 Decode th windows-874 267 267 2877903 Decode tr utf-8 214 207 1307787 Decode tr windows-1254 220 220 1255750 Decode vi utf-8 214 202 2593434 Decode vi windows-1258 251 251 2353548 Decode zh_cn utf-8 235 206 1265369 Decode zh_cn gb18030 193 179 1456071 Decode zh_tw utf-8 236 207 1263775 Decode zh_tw big5 203 190 1393010 Decode ja euc-jp 181 165 1209320 Decode ja iso-2022-jp 92.76 78.58 2537500 Decode bin x-user-defined 220 220 58543 Decode js utf-8 229 229 360997 Decode js big5 222 222 372837 Decode js euc-kr 223 223 370419 Decode js gb18030 223 223 370731 Decode js gbk 218 218 378857 Decode js shift_jis 216 216 383438 Decode js euc-jp 222 222 371745 Decode js iso-2022-jp 91.11 91.11 907597 Decode js windows-1250 231 231 357297 Decode js windows-1251 218 218 378663 Decode js windows-1252 230 230 360143 Decode js windows-1253 222 222 372268 Decode js windows-1254 225 225 368226 Decode js windows-1255 221 221 374879 Decode js windows-1256 219 219 378375 Decode js windows-1257 222 222 371812 Decode js windows-1258 219 219 377049 Decode js windows-874 219 219 378315 Decode js☃ utf-8 238 238 347533 Encode ar utf-8 133 74.49 165170 Encode cs utf-8 135 122 342971 Encode de utf-8 238 234 308094 Encode el utf-8 139 77.25 222716 Encode en utf-8 282 281 219035 Encode fr utf-8 165 159 825270 Encode he utf-8 134 76.96 351947 Encode ja utf-8 194 69.46 160589 Encode ko utf-8 138 60.03 137715 Encode pt utf-8 207 201 313627 Encode ru utf-8 142 79.36 785702 Encode th utf-8 199 71.29 888390 Encode tr utf-8 136 123 392599 Encode vi utf-8 116 89.27 712368 Encode zh_cn utf-8 185 68.48 159075 Encode zh_tw utf-8 185 68.39 159573 Encode js utf-8 319 319 259043
Flags: needinfo?(hsivonen)
Comment 18•7 years ago
|
||
Hrm. So I'm left wondering whether this fix is a win in general at all, given the behavior described in bug 1375566. :(
Flags: needinfo?(sphink)
Flags: needinfo?(jdemooij)
Assignee | ||
Comment 19•7 years ago
|
||
(In reply to Boris Zbarsky [:bz] from comment #18) > Hrm. So I'm left wondering whether this fix is a win in general at all, > given the behavior described in bug 1375566. :( So the problem is that instead of sweeping, we now spend a lot of time waiting for the background thread, right? We use background sweeping for a lot of other GC things (like non-external strings) and if there's an issue with that, it likely affects those other things as well. sfink or jonco can probably say more about this..
Flags: needinfo?(jdemooij)
Assignee | ||
Updated•7 years ago
|
Flags: needinfo?(jcoppeard)
Comment 20•7 years ago
|
||
> we now spend a lot of time waiting for the background thread, right? Well, it's _a_ problem. I don't know whether it's the only problem in this case. In particular, I can see how that problem would make this change not improve performance. It's not clear to me why performance would actually become "much worse" as comment 17 describes. That seems pretty fishy to me. Maybe we're ending up with significant contention on the malloc locks or something?
Comment 21•7 years ago
|
||
Could it be because of reasons along the lines of Julian's findings in bug 1291355 comment 26? jemalloc's multi-core performance is terrible: having a background thread call into jemalloc more often can certainly make malloc a lot slower, not due to lock contention, but due to cache line movement since all data is shared across all cores basically. Perhaps Julian can help?
Comment 22•7 years ago
|
||
(In reply to :Ehsan Akhgari (needinfo please, extremely long backlog, Away Aug 7-8) from comment #21) > Could it be because of reasons along the lines of Julian's findings in bug > 1291355 comment 26? [..] Perhaps Julian can help? I can run some low level memory-system (MESI, etc) profiles to see what's going on, if someone can give me STR for the alleged slowdown. In other words, what test case + configuration should I measure?
Flags: needinfo?(ehsan)
Comment 23•7 years ago
|
||
The testcase is https://hsivonen.com/test/moz/encoding_bench_web/english-only.html and then clicking the "Run" button.
Comment 24•7 years ago
|
||
I hope comment 23 answers your question Julian. :-) Thanks a lot for your help!
Flags: needinfo?(ehsan)
Comment 25•7 years ago
|
||
I can't replicate the slowdown, at least using a local copy of the test, which has been set up to run 4 times back-to-back, per hsivonen's guidance. I am comparing the following versions: without (patch) = 366117:638029b68a85 with = 366119:d4bc665ad39b Measured rates (Encoding MiB/sec) 117: 699 700 696 119: 771 772 767 This is on x86_64 Linux, Fedora 25, non-debug build, -g -O2.
Updated•2 years ago
|
Performance Impact: --- → P1
Whiteboard: [qf:p1]
You need to log in
before you can comment on or make changes to this bug.
Description
•