Slack hangs indefinitely in a onResize loop
Categories
(Web Compatibility :: Desktop, defect, P1)
Tracking
(Not tracked)
People
(Reporter: Harald, Assigned: emilio)
References
(Depends on 1 open bug)
Details
(Keywords: perf, webcompat:site-wait, Whiteboard: [wfh])
Attachments
(1 file, 1 obsolete file)
(deleted),
text/plain
|
Details |
In my personal profile. I have Fission enabled for a few weeks now and had Slack hang indefinitely around 4 times.
This time it happened when I clicked an external link and managed to remember to profile: https://perfht.ml/3d4vT52
First look suggests an infinite react-state/resize loop.
Assignee | ||
Comment 1•4 years ago
|
||
So I hit this really recently too in #firefox, and took a profile as well: https://perfht.ml/2X0m2Yl
Given our discussion in the thread, this seems likely a Slack bug: https://mozilla.slack.com/archives/C4D3JFF26/p1590296114130000
But would be good to get them involved to either fix it or tell us where is Gecko going wrong :-)
Assignee | ||
Comment 2•4 years ago
|
||
Also, I checked and this also affects 76, so it's probably urgent-ish.
Assignee | ||
Comment 3•4 years ago
|
||
At least for me, as mentioned on that thread, it seems related to the black tooltip that shows up when hovering links. Some of them work, some hang.
Assignee | ||
Comment 4•4 years ago
|
||
And for reference the link I was seeing deterministic hangs on is https://news.ycombinator.com/item?id=23287038.
Assignee | ||
Comment 5•4 years ago
|
||
Ok, multiple people reproduced it just hovering over a message with just that link. At least me (Linux) and other two other folks on Mac.
This is a HiDPI screen, which may be relevant if this is related to pixel rounding (window.devicePixelRatio
is 2).
Comment 6•4 years ago
|
||
I've also experienced this on Windows with a HiDPI screen.
Updated•4 years ago
|
Comment 8•4 years ago
|
||
FWIW when I filed bug 1637425 Firefox with my Slack tab was on a low DPI screen (attached to a laptop with a high DPI display which was also active).
Updated•4 years ago
|
Comment 9•4 years ago
|
||
Just sent an email an engineering contact at Slack.
Comment 10•4 years ago
|
||
I've also contacted someone I know on the engineering team.
Comment 11•4 years ago
|
||
Is there anything we can do on our end here? This bug is really annoying...
Comment 12•4 years ago
|
||
(In reply to Emilio Cobos Álvarez (:emilio) from comment #2)
Also, I checked and this also affects 76, so it's probably urgent-ish.
I just managed to reproduce this all the way back to 64 with mozregression (earlier versions of Firefox just crash when I use them with my profile with mozregression, probably unrelated to this bug). My reproduction steps are simple:
- Send a message to myself in slack with the link Emilio posted in comment 4 (https://news.ycombinator.com/item?id=23287038)
- Hover over the link
- ... tab immediately becomes unresponsive ...
This may well be a bug in slack, but IMO it's still on us to try and fix what's going on. When it happens, it affects not only Slack but my entire Firefox session. This is a serious usability issue -- I'm now at the point of restarting Firefox several times a day because of this bug, if it continues for too much longer I'll probably have no choice but to install the Slack application which I really don't want to do.
Updated•4 years ago
|
Comment 13•4 years ago
|
||
I just hit this by attempting cmd+click of a link... to this bug from a PM. I guess we still don't really understand what the issue is?
(edit: it seems like it's enough to just close the tab, and re-open a new one if you want to avoid a full browser restart. but that's super annoying)
Assignee | ||
Comment 14•4 years ago
|
||
I'm poking at this fwiw. This is the code that doesn't reach an stable state... Still trying to figure out why, but maybe different rounding from getBoundingClientRect and offsetHeight or something of that sort.
Assignee | ||
Comment 15•4 years ago
|
||
It is FP error, btw:
bounds
Object { left: 331.9583435058594, top: 424.54998779296875 }
contentBounds
Object { left: 331.9583282470703, top: 424.54998779296875 }
Assignee | ||
Comment 16•4 years ago
|
||
So, relevant DOM and frame tree:
div@0x7fa55c0198b0 style="position: absolute; left: 331.958px; top: 674.75px; outline: currentcolor none medium; transition-duration: 80ms;" class="ReactModal__Content ReactModal__Content--after-open popover c-popover__content" tabindex="-1" role="presentation" state=[100020800000] flags=[00088402] primaryframe=0x7fa54d3a9178 refcount=7<
div@0x7fa55c019820 role="presentation" state=[100020800000] flags=[00088404] primaryframe=0x7fa54d3a8fe8 refcount=2056<
div@0x7fa55c019790 id="sk-tooltip-340" role="tooltip" class="c-tooltip__tip c-tooltip__tip--top c-tooltip__tip--small c-tooltip__tip--link" data-qa="tooltip-tip" data-sk="tooltip" state=[100020800000] flags=[00088400] primaryframe=0x7fa54d3a9c70 refcount=5<
Text@0x7fa55c017c80 flags=[00300000] primaryframe=0x7fa54d3a98e0 refcount=3<https://news.ycombinator.com/item?id=23287038>
div@0x7fa55c019700 class="c-tooltip__tip__arrow" style="left: 157.158px; right: initial;" state=[100020800000] flags=[00008000] primaryframe=0x7fa54d3a81e0 refcount=2057<>
>
>
>
Block(div)(0)@7fa54d3a9178 parent=7fa54d3a9620 (19918,40485,18859,2640) [state=0042062000d00300] [content=7fa55c0198b0] [cs=7fa54e156b58]<
line 7fa552dd8d30: count=1 state=block,clean,prevmarginclean,not-impacted,not-wrapped,before:nobr,after:nobr[0x8] (0,0,18859,2640) <
Block(div)(0)@7fa54d3a8fe8 parent=7fa54d3a9178 (0,0,18859,2640) [state=0002020000100200] [content=7fa55c019820] [cs=7fa54e156f18]<
line 7fa552dd8e28: count=1 state=inline,clean,prevmarginclean,not-impacted,not-wrapped,before:nobr,after:nobr[0x180] (0,0,18859,2640) <
Block(div)(0)@7fa54d3a9c70 parent=7fa54d3a8fe8 (0,0,18859,2160) vis-overflow=(0,0,18859,2521) scr-overflow=(0,0,18859,2521) normal-position=(0,0) [state=0002062000d00220] [content=7fa55c019790] [cs=7fa551360978]<
line 7fa552dd8f20: count=2 state=inline,clean,prevmarginclean,not-impacted,not-wrapped,before:nobr,after:nobr[0x100] (720,480,17419,1080) <
Text(0)"https://news.ycombinator.com/item?id=23287038"@7fa54d3a98e0 parent=7fa54d3a9c70 next=7fa558aafaf8 (720,540,17419,960) [state=00010000b0600000] [content=7fa55c017c80] [cs=7fa55172ca68:-moz-text] [run=7fa5516b92e0][0,45,T]
Placeholder(div)(1)@7fa558aafaf8 parent=7fa54d3a9c70 (18139,1320,0,0) [state=0000000000200000] [content=7fa55c019700] [cs=7fa55d53fa68:-moz-oof-placeholder] outOfFlowFrame=Block(div)(1)@7fa54d3a81e0
>
AbsoluteList 0x7fa5513c0fc0 <
Block(div)(1)@7fa54d3a81e0 parent=7fa54d3a9c70 (9430,2011,509,509) vis-overflow=(-360,-211,720,721) scr-overflow=(-360,-211,720,721) [state=004026a000d10300] transformed [content=7fa55c019700] [cs=7fa54e159028]<
>
>
>
>
>
>
>
They have a mutation observer for everything in that tooltip. Positioning the tooltip causes a mutation of the style attribute of tooltip arrow's style attribute, with a stack like this:
0 Hn(e = [unavailable], t = [unavailable]) ["https://a.slack-edge.com/bv1-8-d2a8b3a/gantry-vendors.c07873c.min.js?cacheKey=gantry-1591870348":1:847085]
1 Cf(e = [unavailable], t = [unavailable]) ["https://a.slack-edge.com/bv1-8-d2a8b3a/gantry-vendors.c07873c.min.js?cacheKey=gantry-1591870348":1:902745]
2 Fs/<(e = [unavailable], t = [unavailable]) ["https://a.slack-edge.com/bv1-8-d2a8b3a/gantry-vendors.c07873c.min.js?cacheKey=gantry-1591870348":1:915875]
3 () ["self-hosted":948:0]
4 KqkS/t.unstable_runWithPriority(e = "1", t = [function]) ["https://a.slack-edge.com/bv1-8-d2a8b3a/gantry-vendors.c07873c.min.js?cacheKey=gantry-1591870348":1:198347]
5 za(e = [unavailable], t = [unavailable]) ["https://a.slack-edge.com/bv1-8-d2a8b3a/gantry-vendors.c07873c.min.js?cacheKey=gantry-1591870348":1:860500]
6 Fs(e = [unavailable]) ["https://a.slack-edge.com/bv1-8-d2a8b3a/gantry-vendors.c07873c.min.js?cacheKey=gantry-1591870348":1:913815]
7 vs(e = [unavailable], [unavailable]) ["https://a.slack-edge.com/bv1-8-d2a8b3a/gantry-vendors.c07873c.min.js?cacheKey=gantry-1591870348":1:909500]
8 vs([unavailable]) ["self-hosted":891:0]
9 Ja/<() ["https://a.slack-edge.com/bv1-8-d2a8b3a/gantry-vendors.c07873c.min.js?cacheKey=gantry-1591870348":1:860790]
10 KqkS/t.unstable_runWithPriority(e = "1", t = [function]) ["https://a.slack-edge.com/bv1-8-d2a8b3a/gantry-vendors.c07873c.min.js?cacheKey=gantry-1591870348":1:198347]
11 za(e = [unavailable], t = [unavailable]) ["https://a.slack-edge.com/bv1-8-d2a8b3a/gantry-vendors.c07873c.min.js?cacheKey=gantry-1591870348":1:860500]
12 Ja() ["https://a.slack-edge.com/bv1-8-d2a8b3a/gantry-vendors.c07873c.min.js?cacheKey=gantry-1591870348":1:860738]
13 Qa() ["https://a.slack-edge.com/bv1-8-d2a8b3a/gantry-vendors.c07873c.min.js?cacheKey=gantry-1591870348":1:860671]
14 Is(e = [unavailable], t = [unavailable]) ["https://a.slack-edge.com/bv1-8-d2a8b3a/gantry-vendors.c07873c.min.js?cacheKey=gantry-1591870348":1:906068]
15 enqueueSetState(e = [unavailable], t = [unavailable], n = [unavailable], [unavailable]) ["https://a.slack-edge.com/bv1-8-d2a8b3a/gantry-vendors.c07873c.min.js?cacheKey=gantry-1591870348":1:865578]
16 viRO/A.prototype.setState(e = [unavailable]) ["https://a.slack-edge.com/bv1-8-d2a8b3a/gantry-vendors.c07873c.min.js?cacheKey=gantry-1591870348":1:1155126]
17 measureContents() ["https://a.slack-edge.com/bv1-8-d2a8b3a/client-boot-imports.3281577.min.js?cacheKey=gantry-1591870348":318220:29]
18 onResize([unavailable], [unavailable]) ["https://a.slack-edge.com/bv1-8-d2a8b3a/client-boot-imports.3281577.min.js?cacheKey=gantry-1591870348":318072:92]
19 onResize([unavailable], [unavailable]) ["self-hosted":844:0]
Which in turn will cause another iteration of onResize
. But the tooltip arrow changes the x
of the result of getBoundingClientRect()
. Here's an iteration of the hang:
// With div@0x7fa55c019700 class="c-tooltip__tip__arrow" style="left: 157.158px; right: initial;" state=[100020800000] flags=[00008000] primaryframe=0x7fa54d3a81e0 refcount=2056<>
getBoundingClientRect's aLayoutRect: (const nsRect &) @0x7fffec8f9830: {<mozilla::gfx::BaseRect<int, nsRect, nsPoint, nsSize, nsMargin>> = {x = 19918, y = 40485, width = 18859, height = 2640}, <No data fields>}
set left aValue: 157.15834045410156px
// After: div@0x7fa55c019700 class="c-tooltip__tip__arrow" style="left: 157.158px; right: initial;" state=[100020800000] flags=[00008000] primaryframe=0x7fa54d3a81e0 refcount=2058<>
// But we report change, because internally we store it with a bit more precision: 157.158325px vs 157.15834px. so onResize will run again!
// With div@0x7fa55c0198b0 style="position: absolute; left: 331.958px; top: 674.75px; outline: currentcolor none medium; transition-duration: 80ms;" class="ReactModal__Content ReactModal__Content--after-open popover c-popover__content" tabindex="-1" role="presentation" state=[100020800000] flags=[00008402] primaryframe=0x7fa54d3a9178 refcount=6<
set left aValue: 331.9583282470703px
// After: div@0x7fa55c0198b0 style="position: absolute; left: 331.958px; top: 674.75px; outline: currentcolor none medium; transition-duration: 80ms;" class="ReactModal__Content ReactModal__Content--after-open popover c-popover__content" tabindex="-1" role="presentation" state=[100020800000] flags=[00088402] primaryframe=0x7fa54d3a9178 refcount=7<
// But again, we report a change, though in this case it doesn't matter much because there's no observer going on.
getBoundingClientRect's aLayoutRect: (const nsRect &) @0x7fffec8f9830: {<mozilla::gfx::BaseRect<int, nsRect, nsPoint, nsSize, nsMargin>> = {x = 19917, y = 40485, width = 18859, height = 2640}, <No data fields>}
// Note how the x coordinate changes a fraction of a CSS pixel. That makes slack come up with different measurements.
Then we try to set left with value 157.1583251953125px and 331.9583435058594px respectively, rinse and repeat.
So this is a minor floating point difference in the style, which causes a minor subpixel difference in layout, and thus in the return value of getBoundingClientRect, which causes them to come up with an slightly different measurement.
So if this doesn't happen in other engines it seems mostly by chance (other engines use different floating point precision in layout, we use 1/60th of a CSS pixel and Blink uses 1/64th).
The floating point shenanigans we do in getBoundingClient rect here may be relevant to some extent, but this seems like an issue with Slack still.
Assignee | ||
Comment 17•4 years ago
|
||
Or maybe other browsers store it with different precision in the style attribute which causes us to fire the mutation observer, but not them...
Assignee | ||
Comment 18•4 years ago
|
||
https://treeherder.mozilla.org/#/jobs?repo=try&revision=49c440f54dcff7ef5004747ae4c07442bfed966e seems to "fix" it... Though maybe it just fixed the case I could repro. I think the change makes sense though.
Assignee | ||
Comment 19•4 years ago
|
||
There are some issues with this code. It's using doubles, but SetRect
only takes floats, so we lose precision there already. Instead of
getting fancy, use the same conversion that the rest of Gecko uses, so
that pages that compute style values out of getBoundingClientRect values
don't get minor floating point changes.
Comment 20•4 years ago
|
||
How do we make sure that this patch does not create new bugs where pages rely on the current behavior? However, I'm not sure whether it is even possible to rely on it...
Assignee | ||
Comment 21•4 years ago
|
||
Generally the patch makes rounding consistent between CSSOM and Layout and getBoundingClientRect, so I think it's a strict improvement. Depending on this behavior in such a way is pretty hard, fwiw... Specially since engines have different fixed-point factors.
Comment 22•4 years ago
|
||
(In reply to Emilio Cobos Álvarez (:emilio) from comment #18)
https://treeherder.mozilla.org/#/jobs?repo=try&revision=49c440f54dcff7ef5004747ae4c07442bfed966e seems to "fix" it... Though maybe it just fixed the case I could repro. I think the change makes sense though.
I just confirmed it fixes the one link where I could reproduce 100% of the time (on macOS).
Assignee | ||
Comment 23•4 years ago
|
||
(But yeah, still it's a mildly scary change)
Comment 24•4 years ago
|
||
Comment 25•4 years ago
|
||
Backed out for wpt failure on inferred-mrow-baseline.html
Backout link: https://hg.mozilla.org/integration/autoland/rev/ea4566e27ce5b39dcce36a54c11101e831069692
Log link: https://treeherder.mozilla.org/logviewer.html#/jobs?job_id=306170404&repo=autoland&lineNumber=4114
Comment 26•4 years ago
|
||
Assignee | ||
Comment 27•4 years ago
|
||
Ah, try auto fail, there's my dogfood ;)
Ok, so my patch effectively increases the precision of getBoundingClientRect() in a way that breaks consumer's expectations, because suddenly stuff that should match doesn't because floating point is fun.
Blink and WebKit have a much lower precision for these, because their LayoutUnit fixed-point denominator is 64, so they don't hit crazy fractional units when doing float conversion. We could consider lowering the precision of getBoundingClientRect and co. to be similar to theirs I guess, though I need to think about all the implications...
Assignee | ||
Comment 28•4 years ago
|
||
So after thinking a bit more and talking with Daniel, I've decided to put the current state of affairs here, because there doesn't seem like any of the potential solutions below is either trivial or particularly satisfying.
To recap for the folks that I'm cc'ing, this is effectively a result of nscoord
not being precisely representable as a float, and slack introducing minor floating point errors that cause us to switch layout units back and forth.
When it comes to exposing a layout rect in app units in float precision, we have the following alternatives:
-
Right now, we round to the closest 2^16, so that the results make sense for websites. But that means that minor floating point precision errors from JS code can cause us to move 1 fractional unit or such, which is the cause of this bug.
-
My patch removes that rounding, which means that nscoord values get represented much more closely to the floats we return, making the issue above not happen. However, I suspect that's going to bring more pain than joy, because that means that suddenly results from
getBoundingClientRect
have minor differences like in the tests that fail in comment 25, even though it should be the same value, depending on the order of operations. We could try to push through with this, I guess, though I'm not convinced it's going to be great and it's likely to break some expectations from web developers... -
There's the option to round to a smaller power of two (2^14, etc...). The smaller we go, the easier it gets to make
div.style.left = div.getBoundingClientRect().left + "px";
not round-trip, but the harder it is for JS to cause this kind of bug. -
There's the option to round to a bigger power of two, I guess, increasing the precision of
getBoundingClientRect()
, which makes it closer to the behavior of my patch, but still potentially hitting these cases which isn't great. It might be worth a shot? I need to think harder about this. -
There's the option to switch representation of layout units to something that is precisely representable as a float. Choosing a power of two like Blink / WebKit does also has the benefit of faster multiplication and division. It loses the properties outlined in bug 177805 comment 7 and such, which is a bit unfortunate. In any case that is not a call I can make alone.
cc'ing some folks which may have opinions on all the issues above...
Still the easiest fix is for slack to drop a few toFixed(2)
s around the code in attachment 9155947 [details]... :-)
Assignee | ||
Comment 29•4 years ago
|
||
Rounding to a smaller power of two may be the easiest / safest short-term solution if we need to fix this bug ASAP.
Comment 32•4 years ago
|
||
I just got news via a Slack contact that this has been fixed -- I've just verified after clearing the cache (via cmd + opt + r) that it no longer reproduces on the 100% reproducible case I had. In case it's useful for someone in the future, Slack said:
You may need to unregister your service worker if you are still experiencing the issue.
Comment 33•4 years ago
|
||
Emilio, when you get a chance, can you verify the fix on your ycombinator link?
Comment 34•4 years ago
|
||
Actually, I should have asked Harald. :)
Updated•4 years ago
|
Reporter | ||
Comment 35•4 years ago
|
||
No hangs for that case (but Slack performance has not been great today either).
Updated•4 years ago
|
Comment 36•4 years ago
|
||
I can't tell if I am hitting it here (Slack tab stuck with a slow script warning) because I didn't force-reload my Slack Service Worker or if this is another issue but I took a profile anyway: https://share.firefox.dev/2ACqWm1
Assignee | ||
Comment 37•4 years ago
|
||
This seems to have regressed again :(
https://share.firefox.dev/3gpLUTJ
Mike, can you reach out to slack? I can reproduce if hover over a message that says iszoomdownforeveryoneorjustme.com
(with the backticks / quoted code font).
Comment 38•4 years ago
|
||
Yep, I just dropped a message in our shared channel (and added you).
Updated•3 years ago
|
Description
•