Closed
Bug 1210465
Opened 9 years ago
Closed 9 years ago
Issues with test_basic_pan.html
Categories
(Core :: Panning and Zooming, defect)
Tracking
()
RESOLVED
FIXED
Tracking | Status | |
---|---|---|
firefox44 | --- | affected |
People
(Reporter: kats, Assigned: kats)
References
Details
Investigation bug for all the various intermittent failures with test_basic_pan.html which seem to have increased recently. I suspect a few of them have the same underlying cause.
Assignee | ||
Comment 1•9 years ago
|
||
Try push with some logging enabled: https://treeherder.mozilla.org/#/jobs?repo=try&revision=e35818ef4683. My plan is to retrigger test_basic_pan until it hits one or more of these failures and use the logging to find out roughly what's going on, and dig in from there.
Comment 2•9 years ago
|
||
If I understand what's intended, the mach command to push to try should be an effective workaround for bug 1183144: running |./mach try --tag apz -p emulator| will run the relevant tests in a single chunk (it will appear as M(1)).
Assignee | ||
Comment 3•9 years ago
|
||
Indeed, the try push I have above seems to provide the results that I need. Removing dependency on that bug.
The failure from bug 1208200 showed up in my retriggers and from the log it looks like the touch events are synthesized just fine, but the main thread timeout is hit before the div gets layerized, and so the touch events end up scrolling the root document's APZ instead. This should be avoidable by increasing the content response timeout. In the log at [1] the timeout is installed at 20:19:29.244 and the response arrives at 20:19:47.784 which is 18.54 seconds later. This exceeds the 15 second timeout that we use in the testing environment, causing the problem. That being said, I'm not sure what was happening between 20:19:30.124 and 20:19:40.604 since that's a 10 second gap with no output in logcat. It could be that there's some underlying bug that caused the main thread to stop pumping events.
[1] http://mozilla-releng-blobs.s3.amazonaws.com/blobs/try/sha512/6801cf4314f00cbd908cf01f3a4422cf93d4b6aaff8fd676fa785a2b4f4f89b9006b51217c69630ac9b79e1b740564de8592fa80ea0b95878bf7edf5afe8e32b
No longer depends on: 1183144
Assignee | ||
Comment 4•9 years ago
|
||
In the log at [2] the same thing happens for the iframe, and there it takes just under 50 seconds for the content response to get back!
[2] http://mozilla-releng-blobs.s3.amazonaws.com/blobs/try/sha512/b977c422e5823bb2f36a3a113ca8d193d79896935fa55317f127992242ddfdbbd063ecf54913a8fdd730a30aea5ff1496b894ca7a1ae4974454ff38fd9f3ab36
Assignee | ||
Comment 5•9 years ago
|
||
Bug 1203901 also showed up in the retriggers, e.g. log at [3]. The reason it times out is because the touch events are delivered to the root APZ in the root process, so no scroll occurs and presumably no APZ:TransformEnd notification is dispatched in the child process. If we look at the hit-testing tree dump we can see that the child process layers (l=3) haven't shown up in the tree yet. This is evident if you compare it to the equivalent tree dump in the logs [1] or [2] from my previous comments. So likely there's a race here where the "onload" script runs before the compositor actually has the layers from the page that got loaded. We can probably fix this by doing a flushApzRepaints or some other equivalent wait-for-paint magic before synthesizing the drag.
[3] http://mozilla-releng-blobs.s3.amazonaws.com/blobs/try/sha512/4630d8afe7fe3fb6b013d8a0fd2ecac92b376ee85b89b41e0e9a3380e9b9681b3802170c2ecac94705661b1c24b40070ca96e81b40be5ecdd9ef4ac27d9bb992
Assignee | ||
Comment 6•9 years ago
|
||
I suspect both bug 1208973 and bug 1209444 are due to improper multithreaded access of mQueuedTask in TaskThrottler. These crashes both started after the recent changes to TaskThrottler. There's a log at [4] with such a crash but it doesn't help that much.
[4] http://mozilla-releng-blobs.s3.amazonaws.com/blobs/try/sha512/01abdc9cfd862deb17f186b59763e6e16763976d17d8c8335448b49b63481eb6ac86086e9fea490bf0b086e3bf585852c2f32c143cbee2e16819842d72e262fb
Assignee | ||
Comment 7•9 years ago
|
||
I did another try push at https://treeherder.mozilla.org/#/jobs?repo=try&revision=0a8b377ccfc0 with what I hope is a fix for bug 1203901 and more logging for the other issues.
Assignee | ||
Comment 8•9 years ago
|
||
(In reply to Kartikaya Gupta (email:kats@mozilla.com) from comment #4)
> In the log at [2] the same thing happens for the iframe, and there it takes
> just under 50 seconds for the content response to get back!
Log at [5] has more logging, it shows that the APZ deals with the touchstart at around 18:28:34.691 and the TabChild doesn't see it until 18:29:02.152. That's 28 seconds just to get the event to the child process. It doesn't seem to get stalled anywhere in particular, so I think this is just SLOW. Admittedly there's a bunch of other stuff happening (the drag event generates a lot of touch events) so if this is running on a slow machine that might explain it.
[5] http://mozilla-releng-blobs.s3.amazonaws.com/blobs/try/sha512/5f672c2d6bb4f9204651bdb3ef46468bee3dee02dd77daf224a58cea113e2e2c0af11697a013a5a03d382810feb05aa0700694457b633de44f51dde5d8555f9f
Comment 9•9 years ago
|
||
Wow - and I thought 15 seconds was an eternity to begin with!
Assignee | ||
Comment 10•9 years ago
|
||
Yeah :(
The new try push has a few "test timed out" as well - for some of them there's a new issue with the drag turning into a long-press. We can fix that by increasing the long-press threshold timeout. And there's still an issue with the layer tree not being correct at the time the touchstart comes in, so I guess my patch didn't really work. It might need an APZ flush as well.
Assignee | ||
Comment 11•9 years ago
|
||
Assignee | ||
Comment 12•9 years ago
|
||
The retriggers on this are showing a couple of failures. One is the same "layer tree not updated" problem. I fixed it for helper_div_pan and helper_iframe_pan but forgot to do it for helper_basic_pan.
The crash from bug 1208973 also showed up and as I expected the log [6] indicates that it's probably a concurrency error. Specifically the task 0x46872780 gets run twice in the same throttler, which should never happen. I suspect that the second of those tasks ends up sending a corrupted framemetrics which then causes an assertion failure on the child send when it tries to unpack it. Adding some locking around this should fix it.
[6] http://mozilla-releng-blobs.s3.amazonaws.com/blobs/try/sha512/ecf36972bf0a5fddc3432da6e9d3fe799eb019e521df230fceb0133f10e9c10ddfe7f9f2e6dc63b8d88a96eb64330821cfb8f51edec74368a8b4ee9dbe438153
Assignee | ||
Comment 13•9 years ago
|
||
Looks like round 4 is a winner: https://treeherder.mozilla.org/#/jobs?repo=try&revision=917d12e6eebb&group_state=expanded
Assignee | ||
Comment 14•9 years ago
|
||
My investigation is complete; I have attached patches to 3 of the 5 dependent bugs. The other two are probably dupes.
Status: NEW → RESOLVED
Closed: 9 years ago
Resolution: --- → FIXED
You need to log in
before you can comment on or make changes to this bug.
Description
•